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EDITOR'S  NOTE 

This  number  of  the  Studies  in  Education  presents  ma- 
terial collected  and  developed  in  an  unexpected  manner. 
The  program  of  the  191 7  Summer  Courses  of  the  Johns 
Hopkins  University,  under  the  writer's  direction,  included 
a  Demonstration  School  of  six  grades,  designed  to  be  used 
by  students  for  the  observation  of  teaching  in  connection 
with  several  university  courses  in  elementary  education.  It 
included  grades  one  and  two,  taught  by  Miss  Ida  V.  Flowers ; 
four,  by  Miss  Maude  B.  Smith;  five,  by  Miss  Helen  M. 
Burnett;  six,  by  Miss  Matilda  Srager;  and  seven,  by  Miss 
Julia  F.  Beck.  One  hundred  fifty-five  children  were  en- 
rolled, the  majority  of  whom  were  pupils  in  the  Baltimore 
public  schools  who  had  failed  of  promotion  in  June,  and 
who  hoped  by  the  six  weeks'  study  to  make  their  grades  in 
September.  Special  teaching  difficulties  thus  existed  at  the 
opening  of  the  Demonstration  School. 

Instruction  in  Experimental  Education  (Education  i) 
was  given  in  a  university  course  by  Professor  Bird  T. 
Baldwin,  then  of  Swarthmore  College,  now  Director  of  the 
Iowa  Child  Welfare  Research  Station.  This  course  pre- 
sented methods  of  educational  measurements,  with  applica- 
tion to  problems  in  the  fields  of  physical  growth,  testing  the 
growth  of  general  intelligence,  and  the  degrees  of  attain- 
ment by  pupils  in  various  school  subjects.  The  class  con- 
sisted of  twenty-three  students  of  graduate  or  senior  college 
standing  who  were  taking  the  course  for  the  first  time. 

A  belief  in  the  special  value  of  the  results  of  educational 
measurements  to  the  instructional  problems  of  the  grade 
teacher  brought  the  instructor  and  students  of  Education  i 
and  the  teachers  and  pupils  in  the  Demonstration  School 
together  as  investigators  in  a  laboratory.  The  teachers  of 
the  grades,  especially  four  to  seven,  were  confronted  with 

xi 


xii  editor's  note 

the  varying  needs  of  the  children  which  were  to  be  met  as 
fully  as  possible  during  the  six  weeks  the  school  was  in  ses- 
sion. By  measuring  and  testing  the  children,  many  of  their 
individual  needs  were  defined,  and  the  information  turned 
over  promptly  to  the  teachers  for  the  guidance  of  their  in- 
struction. A  large  cooperative  enterprise  thus  ensued  dur- 
ing the  session,  to  which  many  persons,  both  instructors 
and  students,  contributed. 

While  cooperating  in  the  realization  of  the  teaching  aims 
of  the  Demonstration  School,  Professor  Baldwin  succeeded 
admirably  in  showing  how  a  university  summer  course  in 
experimental  education  can  be  organized  so  as  to  advance 
beyond  instruction  to  investigation.  Members  of  his  class 
were  assigned,  by  groups  and  individually,  to  problems  in 
accordance  with  their  special  interests  and  previous  train- 
ing. He  directed  the  giving  of  the  tests  and  the  formula- 
tion of  the  results,  while  his  students  are  responsible  for 
the  details,  order,  and  preservation  of  the  data  and  the  con- 
clusions of  their  individual  studies.  The  correlations  have 
been  most  carefully  checked  up,  and  the  findings  are  be- 
lieved to  be  accurate.  Professor  Baldwin's  early  entrance 
upon  Government  service  during  the  war  greatly  delayed 
the  publication  of  these  results.  It  is  hoped  that  the  selected 
twelve  studies  presented  herewith  will  offer  material  and 
findings  of  special  comparative  value,  and  give  additional 
impetus  to  the  experimental  movement  in  education. 

The  realization  of  the  original  teaching  aims  of  the  Dem- 
onstration School  was  largely  due  to  the  valuable  assistance 
of  Miss  Florence  E.  Bamberger.  The  manuscript  has  been 
read  in  its  entirety  by  her  and  by  Dr.  Buford  J.  Johnson, 
and  in  part  by  Capt.  Richard  M.  Elliott.  Special  assistance 
has  been  rendered  by  Miss  Agnes  Snyder  and  Mr.  William 
R.  Flowers. 

Edward  F.  Buchner. 


INTRODUCTION  AND  SUMMARY 
Bird  T.  Baldwin. 

The  papers  assembled  in  this  Study  give  a  diagnostic  pic- 
ture of  129  out-of-step  pupils  who  represent,  in  most  in- 
stances, examples  of  maladjustment  in  educational  progress. 
Modern  experimental  education  orientates  from  the  phys- 
ical and  mental  development  of  the  children  who  are  being 
taught,  as  well  as  from  the  subject  matter  of  instruction, 
and  the  results  of  these  studies  show  the  wide  range  of  indi- 
vidual differences  among  children  and  the  limitations  of 
educational  procedure  designed  to  meet  these  differences. 

In  order  to  understand  the  conditions  which  aid  or  hinder 
education,  it  is  essential  that  teachers  with  scientific,  pro- 
fessional training  should  focus  their  attention  upon  the 
school-room  situation,  and  analyze  it  into  its  various  as- 
pects. This,  in  short,  has  been  the  aim  of  this  brief  pre- 
liminary introduction  to  experimental  education.  The  results 
are  limited  to  the  data  at  hand  and  no  attempt  has  been 
made  to  formulate  general  conclusions  beyond  the  results 
included  in  the  study. 

Value  would  have  been  added  to  the  monograph  if  the  ob- 
servations had  been  continued  throughout  a  year  or  a  series 
of  years,  instead  of  a  few  weeks,  and  it  is  hoped  that  this 
work  will  encourage  students  to  undertake  such  investiga- 
tions. It  is  expected  that  trained  experimenters  will  soon 
give  us  a  more  complete  study  on  a  similar  basis,  if  educa- 
tion is  going  to  become  a  science  with  experimental  aspects. 
These  may  in  a  limited  sense  serve  as  type  studies  of  a 
suggestive  nature.  They  however  have  been  prepared  and 
published  essentially  for  the  benefit  of  the  students  who 
made  them,  in  order  that  the  students  may  have  the  advan- 
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tage  of  the  unity  of  the  course  and  the  opportunity  to  im- 
prove on  their  own  work.  PubHcation  has  been  delayed  on 
account  of  the  recent  war. 

A.  Physical  Measurements 

The  129  children  included  in  this  school  are,  on  the  average, 
an  inferior  group  when  compared  with  the  writers'  norms 
obtained  from  consecutive  measurements  of  children  who 
have  had  continuous  school-medical  inspection,  physical 
training  and  directed  play.  These  tentative  norms  are  given 
in  Charts  I  to  IV  in  order  that  comparisons  may  be  made. 
The  physical  status  of  the  Demonstration  School  children 
may  be  found  on  pages  22-26  of  Miss  Campbell's  Report. 

B.  Mental  Measurements 

The  term  "  mental  age  "  is  a  gross  blanket  statement,  since 
mental  traits,  abilities,  interests  and  psychomotor  reactions 
are  not  found  equally  developed  in  the  so-called  normal 
children  of  the  same  "  mental  age."  That  is,  fundamentally, 
a  scale  graduated  into  groups  or  steps  of  "  mental  ages  "  is 
a  rough  approximation  of  what  a  number  of  children  av- 
erage, not  what  they  are. 

The  measuring  scales  for  intelligence,  like  those  for  sub- 
ject matter,  represent  tentative  approximations  and  not  final 
fixed  units.  If  all  so-called  normal  children  or  a  large  per- 
centage of  them  passed  these  particular  tests  we  would  have 
to  raise  our  norm  by  increasing  the  difficulty  of  the  tests. 

The  author  of  the  Stanford  Revision  of  the  Binet  Test 
states  that  the  five  or  six  tests  that  represent  the  "mental 
age  "  of  seven,  for  example,  are  the  five  that  50  per  cent  of 
a  supposed  group  of  normal  children  can  pass.  The  other 
50  per  cent  might  pass  six  other  tests  equally  well.  The 
point  is  that  these  six  tests  do  not  make  a  mental  age  as 
many  are  prone  to  think.  It  is  the  child  that  is  normal  and 
not  the  six  tests.  The  tests  are  devices  which  catch  certain 
combinations  of  traits,  and  "  mental  levels  "  are  generaliza- 
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tions.    The  scales  are  too  gross  in  their  present  forms  for 
careful  psychological  diagnosis. 

That  a  "  mental  age  "  is  not  a  cross  section  of  a  combina- 
tion of  traits  which  fall  within  an  average  but  a  range  of 
traits  and  a  combination  of  traits  may  be  shown  experiment- 
ally. A  "  point  scale  "  allows  for  individual  differences  and 
has  psychological  advantages  over  a  step  scale.  A  "point 
scale  "  should  allow  for  a  finer  gradation  of  points  than  the 
Yerkes  Scale  and  a  wider  range  of  tests  if  a  general  analysis 
of  intelligence  is  the  purpose.  The  new  Yerkes  Adolescent 
Scale  offers  some  advantages  here. 

It  will  be  seen  that  the  Binet-Simon-Goddard-Terman 
scales  are  based  fundamentally  on  a  different  point  of  view 
of  mental  development  than  the  Huey-Yerkes-Bridges  Point 
Scales.  The  one  assumes  that  in  the  normal  child  the  mind 
develops  in  pronounced  stages  or  nodes  and  these  nodes 
correspond  in  the  main  with  certain  chronological  ages ;  the 
other  is  based  on  the  presupposition  that  some  traits  may  or 
may  not  develop  before  others,  any  or  all  may  develop  grad- 
ually— but  the  scale  gives  the  credit  for  what  is  found,  not 
what  is  supposed  to  be  present  at  a  given  age. 

The  scales  have  been  and  are  serving  a  good  purpose  if 
we  remember  that  we  are  psychologists  who  see  the  child 
beyond  the  scale.  The  gathering  together  of  the  tests,  the 
attempt  to  formulate  norms  and  the  practical  use  of  these 
norms  have  all  been  worth  while.  In  the  future  there  will 
probably  be  a  return  to  the  laboratory  type  of  psychological 
experiments  where  individual,  specific  traits  are  studied  in- 
tensively as  before,  but  with  a  much  wider  insight  into  their 
meaning  which  heretofore  has  been  seldom  considered  and 
little  understood. 

The  writer  sees  need  in  the  future  of  the  differentiation 
and  refining  of  the  scales  into  a  series  of  graduated  tests  for 
each  trait  or  ability.  It  will  be  possible  by  a  combination  of 
scores  to  make  up  a  norm.  For  example,  there  should  be  a 
graduated  series  for  auditory  memory  of  digits,  for  words, 
for  sentences,  or  pictures,  etc.,  and  also  for  motor  control, 
for  different  types  of  judgment,  and  so  on. 
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Normal  children  differ  greatly  in  mental  characteristics, 
and  it  is  this  range  of  individual  differences  that  makes  up 
the  class  which  we  should  call  normal  children,  and  not  an 
average  of  them.  It  is  the  average  of  this  child's  own  traits 
that  determines  whether  or  not  he  is  normal.  This  may  be 
proven  experimentally  by  the  examination  of  the  records  of 
the  129  children.  These  pupils  have  been  given  the  Point 
Scale  tests  as  shown  in  Table  i.  If  on  the  basis  of  these 
tests,  the  individuals  are  grouped  according  to  "  mental 
ages,"  it  is  found  that  there  is  a  remarkable  overlapping  of 
abilities.  For  example,  tests  that  are  designed  by  Binet, 
Goddard  and  Terman,  for  7  to  11  years  of  age  are  often 
very  poorly  executed  by  children  that  mentally  score  ages 
from  12  to  16.  To  be  more  specific,  take  the  test  for  draw- 
ing the  two  designs  placed  by  Binet  for  11  years  and  by 
Terman  for  10  years  of  age.  In  Table  i  are  a  large  number 
of  children  that  are  12,  13,  14,  15,  16  or  17  years  old  men- 
tally according  to  the  rating  of  the  mental  tests  that  do  not 
make  a  passing  score  on  this  test.  The  same  is  true  of  every 
other  test  in  any  series  except  the  very  easy  ones. 

This  overlapping  in  abilities  is  very  evident  in  an  unpub- 
lished investigation  made  by  the  writer  of  1,500  delinquent 
children. 

C.  Correlations 

In  the  study  of  psycho-educational  problems  and  proc- 
esses, it  is  frequently  desirable  to  measure  the  relationship 
which  may  exist  between  two  series  of  observations,  tests, 
or  measurements.  It  is  most  difficult,  if  not  impossible, 
to  estimate  this  relationship  by  simply  observing  the  two 
series.  The  relationship  may  conveniently  be  expressed  by 
means  of  a  single  numerical  expression  or  coefficient  of  cor- 
relation.    Probably  the  most  satisfactory  coefficient  is  that 

devised  by  Karl  Pearson.     The  formula  is  r  =     , —   , — - 

where  x  is  the  deviation  from  the  arithmetic  average  (signs 
considered)  for  one  series  and  y  is  the  deviation  from  the 
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arithmetic  average  for  the  other  series.  The  series  are  ar- 
ranged so  that  the  corresponding  items  are  opposite  each 
other  and  the  product  xy  is  for  the  deviation  of  the  corre- 
sponding items.  For  example,  in  correlating  the  results  of 
two  measuring  scales,  the  amount  which  a  child's  record 
deviates  from  the  average  record  in  one  test,  is  multiplied 
by  the  amount  which  the  same  child's  record  deviates  from 
the  average  in  the  other  test.  The  sum  is  taken  algebraically. 
If  the  two  series  correspond  exactly  in  their  deviations  in 
the  same  direction,  the  result  is  complete  or  positive  corre- 
lation, or  -j-  I.  If  the  two  series  correspond  exactly  in 
their  variations,  but  in  opposite  directions,  the  result  is  com- 
pletely negative  correlation  or  —  i.  If  no  relation  exists  be- 
tween the  two  series^  the  degree  of  correlation  is  o.  Inter- 
mediate values  may  exist  anywhere  between  — i  and  +  i. 
The  probable  error  of  a  coefficient  has  been  carefully  worked 

out  and  may  be  obtained  by  the  formula  '- p ,  where 

n  is  the  number  of  pairs  of  items.  It  indicates  that  the  co- 
efficient actually  lies  between  r  plus  the  probable  error  and 
r  minus  the  probable  error.  The  size  of  the  probable  error 
always  varies  inversely  with  the  size  of  the  coefficient  and 
with  the  number  of  items.  The  coefficient  should  be  over 
.30  to  show  correlation,  and  .50  or  over  indicates  decided 
correlation  if  the  coefficient  is  at  least  six  times  the  prob- 
able error. 

In  order  to  determine  whether  or  not  the  indices  of  intel- 
lectual ability  as  measured  by  the  Yerkes-Bridges  scale  in 
this  particular  group  of  children  showed  any  direct  relation- 
ship to  the  physical  measurements  of  height,  weight,  grip 
and  breathing  capacity,  coefficients  of  correlation  were  de- 
termined by  means  of  the  Pearson  formula  for  all  meas- 
urements of  67  boys  ranging  from  6  to  16  years  of  age,  and 
60  girls  ranging  from  5  to  16  years  of  age. 

The  results  show  no  correlation  for  the  boys,  since  the  co- 
efficient for  intellectual  ability  and  height  is  —  .228 ;  for  in- 
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tellectual  ability  and  weight  —  .135;  for  intellectual  ability 
and  grip  — .226;  and  for  intellectual  ability  and  lung  ca- 
pacity — .197  with  the  probable  errors  ±.078,  =t.o8i,  ±.078, 
and  db  .079,  respectively.  In  other  words  with  this  group  of 
children  there  is  no  evidence  that  physical  growth  as  indi- 
cated by  height,  weight,  grip,  and  breathing  capacity  shows 
a  positive  or  negative  relationship  with  intellectual  abilit}^  as 
indicated  by  the  Yerkes-B ridges  Scale  for  measuring  intelli- 
gence. For  the  I.  Q.  (Stanford),  the  coefficients  are  —  .351, 
—  .287,  —  .293,  —  .314,  respectively.  Here  there  is  a  slight 
negative  correlation. 

For  the  60  girls,  results  indicate  very  slight  correlation 
with  the  tendency  toward  the  negative  direction.  For  in- 
tellectual ability  and  height  the  coefficient  is  — .307,  with 
the  probable  error  of  dz  .079 ;  for  intellectual  ability  and 
weight  — .166  with  a  probable  error  of  ±.085;  for  intel- 
lectual ability  and  grip  — .190  with  a  probable  error  of 
±  .084 ;  for  intellectual  ability  and  breathing  capacity  —  .069 
with  a  probable  error  of  zh  .087.  For  the  I.  Q.  (Stanford), 
the  coefficients  are  —  .449,  —  .377,  —  .397,  —  .343,  respec- 
tively.   Here  there  is  more  marked  negative  correlation. 

In  interpreting  results  from  these  data,  it  must  be  born  in 
mind  that  this  group  represents  both  retarded  and  accel- 
erated pupils,  that  while  many  children  came  to  school  be- 
cause they  were  below  grade,  others  came  to  take  advantage 
of  the  opportunity  for  promotion  which  would  follow  the 
completion  of  one  or  more  subjects.  The  girls  are  inferior 
to  the  boys,  mentally. 

Since  these  same  children  had  been  tested  by  thirteen 
scales  designed  to  measure  degree  of  attainment  in  subject 
matter,  the  relationship  between  the  results  of  these  tests 
and  intellectual  ability  was  determined.  The  coefficient  of 
correlation  for  each  scale  and  intellectual  rating  as  meas- 
ured by  the  Stanford  Revision  Scale  and  by  the  Yerkes 
Bridges  Scale  was  obtained  for  boys  and  girls  separately. 
The  coefficients  with  their  probable  errors  are  given  in 
Table  2. 
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As  a  whole  there  is  little  indication  of  correlation  between 
intellectual  rating  and  the  different  measuring  scales.  In 
the  case  of  the  three  handwriting  scales  there  is  no  correla- 
tion for  either  boys  or  girls  between  handwriting  ability  and 
the  intelligence  quotients  or  intellectual  ability,  since  the 
coefficients  vary  between  —  .215  and  +  .203.  Also,  there  is 
no  correlation  between  either  the  intelligence  quotients  or 
intellectual  ability  and  the  Woody  tests  for  abilities  in  addi- 
tion, subtraction,  multiplication,  and  division,  the  coefficients 
being  very  close  to  zero  except  for  the  girls'  coefficients  for 
I.  A.  and  there  the  highest  coefficient  is  +  305  with  a  prob- 
able error  of  ±  .085.  With  the  English  composition  scale 
there  is  no  evidence  of  correlation  and  for  the  Trabue  com- 
pletion scale  the  coefficient  is  low.  There  apparently  is  slight 
correlation  between  intellectual  ability  for  the  girls  and 
Ayres'  spelling,  since  the  coefficient  is  -j-  .351,  but  this  can 
be  given  little  meaning,  since  the  correlation  for  the  boys  is 
only  -f-  .099  and  for  the  intelligence  quotients  with  boys  and 
girls  +  .160  and  -|-  .137  respectively.  The  coefficients  of 
correlation  between  Starch's  spelling  and  the  Stanford  in- 
telligence quotients  are  positive  but  low,  ranging  from 
-|-  .127  to  +  .320.  With  Starch's  Comprehension  Scale  and 
the  Kansas  Silent  Reading  Test,  a  positive  correlation  is 
found  though  it  is  not  high.  The  correlation  between  the  in- 
telligence quotients  and  the  intellectual  ability  for  the  girls 
and  their  records  in  the  Starch  comprehension  scale  is  good, 
for  the  coefficients  are  +  .569  and  +  .474  respectively  with 
probably  errors  of  dz  .065  and  ±  .074.  For  the  boys,  how- 
ever, there  is  little  or  no  correlation.  For  them,  the  co- 
efficients are  -\-  .437  for  the  Kansas  Silent  Reading  Test  and 
the  intelligence  quotients  (I.  Q.),  with  +  .341  for  the  Kan- 
sas Silent  Reading  Test  and  intellectual  ability  (LA.);  for 
the  girls,  the  coefficients  for  the  Kansas  Silent  Reading  Test 
and  the  intelligence  quotients  (I.  Q.)  is  +  .483,  and  for  the 
Kansas  Silent  Reading  Test  and  intellectual  ability  (I.  A.) 
+  .434.  This  test  is,  therefore,  more  closely  correlated  with 
general  intelligence  than  any  other  measuring  scale  included 
in  this  list. 
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D.  Measuring  Scales  in  Subject  Matter 

A  number  of  inter-correlations  for  the  measuring  scales 
were  computed  and  these  coefficients  with  their  probable 
errors  are  given  in  Table  3.  In  writing,  the  Ayres,  Thorn- 
dike,  and  Freeman  Scales  were  correlated  each  with  each 


TABLE  3 


Inter-Correlations  for 

Scales  Measuring  Degree  of 

Attainment 

IN  Subject  Matter 

Boys 

Girls 

Correlation  Between 

1 

Coef. 

P.  E. 

No. 

Coef. 

p.  E. 

No. 

Woody  Addition  and 

Woody  Subtraction  , .  . 

+  .814 

±•031 

54 

+  .776 

±.040 

45 

Woody  Addition  and 

Woody  Multiplication . 

+  .748 

±.040 

55 

+•774 

±.038 

52 

Woody  Addition  and 

Woody  Division 

+  .776 

±.036 

55 

+•697 

±.048 

52 

Woody  Subtraction  and 

Woody  Multiplication . 

+.763 

±.038 

56 

+•853 

±.027 

45 

Woody  Subtraction  and 

Woody  Division 

-I-.812 

±•031 

56 

+.780 

±.038 

47 

Woody  Multiplication 

and  Woody  Division  .  . 

+•795 

±•033 

58 

+.805 

±.032 

54 

Ayres  Handwriting  and 

Thorndike  Hand- 

writing   

+•915 

db.014 

S7 

-1-.6I2 

±.057 

54 

Ayres  Handwriting  and 

— ^             T^ 

01 

Freeman  Handwriting . 

+.903 

db.oi6 

57 

+.593 

±.059 

54 

Thorndike  Handwriting 

and  Freeman  Hand- 

writing   

+•875 

±•021 

■^7 

+•396 

±.077 

54 

Trabue  Completion 

1  '^  1 0 

01 

1  'oy 

Scale  and  Kansas 

Silent  Reading  Test . .  . 

+.217 

±.084 

58 

+  .301 

±.085 

53 

Starch  Comprehension 

and  Composition 

-f.ll6 

db.090 

54 

+  .264 

±.090 

48 

Ayres  Spelling  and 

Starch  Spelling 

-f.824 

±.029 

56 

+  .756 

db.039 

53 

and  the  results  thus  obtained  show  very  high  correlation  for 
the  boys,  the  coefficients  being  +.915,  and  +.903,  +.875. 
In  the  case  of  the  girls,  there  is  positive  but  not  very  high 
correlation,  the  coefficients  being  +  .612,  +  •593»  +  -39^- 
This  may  indicate  that  the  handwriting  scales  are  graded 
more  in  accordance  with  boys'  writing  than  with  girls'  writ- 
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ing.  In  Arithmetic  the  Woody  tests  for  addition,  subtraction, 
multiplication,  and  division  were  correlated  each  with  the 
other,  and  in  every  case  the  coefficient  shows  a  decided  posi- 
tive correlation,  equally  high  for  boys  and  girls.  The  high- 
est coefficient  is  +  .853  with  a  probable  error  of  ±  .027  and 
the  lowest  is  +  .697  with  a  probable  error  of  zfc  .048.  In 
spelling  very  high  correlation  between  the  Ayres  and  Starch 
scales  is  shown  by  the  coefficients  +  .824  and  +  .756  for 
boys  and  girls  respectively.  The  Starch  Comprehension 
Scale  when  correlated  with  the  Composition  Scale  gives  co- 
efficients of  -f-  .116  and  +  .264  for  the  boys  and  girls,  or  no 
correlation.  There  is  little  correlation  between  the  Trabue 
Completion  Scale  and  the  Kansas  Silent  Reading  Test,  since 
the  coefficients  for  the  boys  and  girls  are  only  +  .217  and 
+  .301.  Therefore,  all  the  high  correlations  are  between 
very  closely  related  tests,  such  as  the  three  handwriting 
scales,  the  four  Woody  Arithmetic  tests  and  the  two  spell- 
ing scales. 

E.  A  Developmental  Graph  of  the  Traits  of  an 
Accelerated  Boy  and  a  Retarded  Boy 

This  circular  diagram  is  designed  to  show  relatively  and 
comparatively  the  individual  degrees  in  attainment  in  phys- 
ical measurements,  mental  tests  and  various  scales  for  meas- 
uring ability  in  subject  matter.  A  group  of  60  boys  and  an- 
other of  55  girls  were  measured  and  tested  individually  in 
twenty-two  physical  and  mental  traits.  The  group  was  com- 
posed of  pupils  of  the  4th  to  7th  grades  inclusive.  The  av- 
erage score  for  the  whole  group  was  obtained  in  each  trait 
or  test  and  this  value  is  given  in  terms  of  100  per  cent  and 
posited  as  a  working  norm  for  comparative  ratings.  Dia- 
grammatically,  this  is  represented  by  the  heavy  circle  with 
radii  representing  100  per  cent  accomplishment.  The  indi- 
viduars  rating  for  any  one  test  is  divided  by  the  group  av- 
erage for  that  test  which  gives  the  per  cent  of  this  average 
score  for  the  individual.  This  is  indicated  on  the  diagram 
by  the  length  of  a  radius.    For  example,  the  average  chrono- 
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logical  age  for  this  group  of  boys  is  12  years  and  8  months. 
The  accelerated  boy  in  the  diagram  was  only  1 1  years  and  3 
months,  or  89  per  cent  of  the  average  age.  But  his  mental 
age  as  scored  by  the  Stanford  Revision  Scale  was  14,  with 
an  average  for  the  group  of  12  years  and  2  months.  There- 
fore, his  mental  age  by  the  Stanford  Revision  Scale  is  rep- 


CHART  V. — A  Developmental  Graph  Showing  Per  Cent  of 

Attainment  in  Terms  of  Group  Averages 
■~~~~~— "     Group  Average,  or  Norm. 

-  —   —    —     Accelerated  Boy. 

Retarded  Boy. 


resented  on  the  circle  by  115  per  cent.  His  mental  age  by 
the  Yerkes  Bridges  Point  Scale  is  16  or  116  per  cent  of  the 
average  Point  Scale  age  of  13  years  and  9  months.  In  every 
case,  the  length  of  the  radius  indicates  the  per  cent  of  the 
average,  and  the  results  in  all  tests  are  thus  made  comparable. 
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The  diagram  shows  two  individuals  records  selected  from 
the  complete  series  of  115,  an  accelerated  boy  and  a  retarded 
boy  of  approximately  the  same  chronological  ages,  for  22 
traits  including, 

1.  Chronological  Age 

2.  Standing  Height 

3.  Weight 

4.  Lung  Capacity 

5.  Grip  of  Right  Hand 

6.  Yerkes  Bridges  Point  Scale 

7.  Coefficient  of  Intellectual  Ability 

8.  Stanford  Revision  Age 

9.  Intellectual  Quotient 

10.  Woody 's  Addition 

11.  Woody 's  Subtraction 

12.  Woody's  Multiplication 

13.  Woody's  Division 

14.  Ayres'  Handwriting 

15.  Thorndike's  Handwriting 

16.  Freeman's  Handwriting 

17.  Kansas  Silent  Reading  Test 

18.  Starch's  Comprehension  Test 

19.  Ayres'  Spelling  Test 

20.  Starch's  Spelling  Test 

21.  Trabue  Completion  Scale 

22.  Composition  Test 

It  will  be  noted  that  there  are  wide  ranges  of  degrees 
of  accomplishment  in  each  trait  for  each  boy;  in  physical 
measurements,  and  in  writing  the  retarded  boy  excels  the 
accelerated  pupil.  In  the  other  traits  the  accelerated  pupil 
is  decidedly  superior  to  the  retarded  boy. 

It  should  be  noted  that  these  standards,  norms,  scales  and 
tests  represent  tentative  approximations  and  not  fixed  nor 
final  units.  Some  have  been  obtained  through  the  consensus 
of  opinion  of  educators  and  school  men  and  represent  an 
"average"  point  of  view,  as  is  illustrated  by  a  few  of  the 
English  Scales.     Some  represent  the  average  or  median  at- 
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tainments  of  pupils  in  representative  schools,  in  certain  as- 
pects of  a  subject;  of  these  the  Courtis  or  Woody  tests  are 
examples.  Some  represent  the  results  of  a  psychological 
and  educational  analysis  of  the  perceptions  involved  in  the 
learning  process,  such  as  the  Freeman  Scale  in  Writing. 
Some  are  types  of  psychological  experiments  which  have 
long  been  used  in  laboratories,  as  Trabue's  use  of  the  Eb- 
binghaus  Completion  Test,  while  others  represent  inductive, 
consecutive  studies  on  a  limited  number  of  selected  children, 
as  Baldwin's  Growth  Scores. 

The  fundamental  point  to  be  understood  is  that  none  of 
these  standards  are  final  or  fixed  and  furthermore  that  they 
should  not  be.  The  educative  process  is  a  changing,  pro- 
gressing, developing  process  within  the  individual,  and  the 
same  principle  of  growth  and  adjustment  should  hold  within 
a  school  system  from  year  to  year  and  from  generation  to 
generation.  All  of  these  scales  are  more  or  less  mechanical 
devices  to  help  foster  growth  within  the  individual,  within  a 
school  grade,  and  within  a  school  system.  All  are  means 
and  not  ends.  Each  of  the  scales  gives  cross  section  points 
of  view  from  particular  angles,  and  it  may  happen  that  two 
scales  in  the  same  subject  will  contradict  each  other,  and 
yet  each  be  correct  and  valuable  for  the  particular  purpose 
for  which  it  was  designed.  The  scales  and  tests  are  val- 
uable as  checks  and  guides ;  but  they  should  not  be  set  up  as 
permanent  goals  nor  even  as  immediate  ends  for  all  chil- 
dren, for  education  is  an  individual  matter  and  the  individ- 
ual capacities  vary  widely.  There  is  always  a  wide  range 
of  individuals  within  a  class,  but  never  an  average  individ- 
ual who  represents  all  of  these  differences.  Teachers  and 
parents  must  learn  to  think  in  terms  of  the  range  of  indi- 
vidual differences.  At  the  same  time  an  effort  should  be 
made  to  bring  the  class  as  a  whole  up  to  a  certain  degree  of 
attainment  as  represented  by  some  particular  scale.  It  is 
conceivable  that  a  child  might  excel  in  one,  two,  or  even 
more  of  the  scales  and  still  be  inferior  in  other  phases  of  the 
subject.    This  is  demonstrated  by  the  fact  that  there  is  much 
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overlapping  in  the  results  obtained  by  the  lower  and  upper 
grades  and  also  at  the  earlier  and  later  ages. 

The  science  of  Experimental  Educational  Psychology  is 
worthy  of  detailed  consecutive  study  under  standarized  con- 
ditions which  can  be  controlled,  repeated,  modified  and  com- 
pared. The  next  logical  step  is  to  determine  what  traits 
and  processes  are  being  measured  and  to  evaluate  their  sig- 
nificance in  physical  and  mental  development. 


II 

PHYSICAL  MEASUREMENTS 
Laura  Winder  Campbell  and  Harry  J.  Kefauver 

This  investigation  deals  with  the  physical  characteristics 
of  one  hundred  and  forty-one  children  in  the  Johns  Hopkins 
Summer  Demonstration  School  including  grades  I,  II,  IV, 
V,  VI,  and  VII ;  and  compares  them  with  the  norms  for  the 
corresponding  chronological  ages  as  established  by  former 
investigators. 

The  measurements  taken  were  the  standing  height  in 
stocking  feet;  sitting  height;  weight,  from  which  two 
pounds  and  a  half  were  deducted  for  clothing;  grip  tested 
three  times  alternately  with  each  hand  of  which  the  highest 
measure  was  used  for  each  hand ;  breathing  capacity  tested 
three  times  and  highest  results  used.  The  stadiometer  was 
used  to  measure  the  height,  the  hand  dynomometer  for  the 
grip,  the  scales  for  the  weight,  the  wet  spirometer  for  the 
breathing  capacity.  The  age  of  the  nearest  birthday  was 
adopted.  The  data  for  each  individual  were  arranged  on  a 
card  so  that  in  compiling  results,  the  cards  could  be  shuffled 
according  to  age,  grade,  height,  weight,  etc.  The  results 
were  computed  separately  for  each  sex;  the  measurements 
were  averaged  for  each  age  and  also  for  each  scholastic 
grade.  Tables  for  each  measurement  are  submitted  show- 
ing the  range  of  distribution,  medians,  and  deviations  for 
each  age  group.  Also  graphs  have  been  plotted  showing 
average  measurements  for  each  group  in  comparison  with 
the  Baldwin  and  Smedley  norms  for  height,  weight,  and 
breathing  capacity,  and  for  right  and  left  grip.  The  av- 
erage instead  of  the  median  was  taken,  as  the  group  of  indi- 
viduals at  each  age  for  each  sex  separately  was  small.  In 
each  of  the  graphs,  the  solid  lines  show  the  results  of  this 
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investigation  while  the  broken  lines  represent  the  norms 
used. 

On  the  whole  it  will  be  seen  from  the  graphs  that  the 
measurements  are  below,  though  approximating  the  norms: 
the  greatest  divergence  occurs  in  the  case  of  the  group  of  14 
year  old  girls  where  the  measurements  are  lower  than  those 
of  a  normal  child  six  months  or  a  year  younger. 

Height 

From  the  age-height  distribution  table,  it  is  evident  that 
at  the  ages  of  11,  12  and  13,  the  tallest  child  is  a  girl  while 
at  all  ages  except  12  and  14,  the  shortest  is  a  boy.  At  the 
ages  of  10,  13,  and  15,  the  girls  exceed  the  boys  in  height. 
It  is  to  be  remembered  in  this  connection  that  the  14  year 
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CHART   VI 


old  group  is  shorter  than  the  13  year  old  group  and  shorter 
than  the  norms.  The  variation  of  individual  heights  in  the 
different  age  groups  is  greatest  for  boys  between  the  ages 
of  13  and  15;  for  girls  from  11  to  14  years.    These  facts 
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are  in  accordance  with  the  conclusion  that  girls  reach  their 
period  of  accelerated  growth  at  an  earlier  age  than  boys. 

The  ratio  between  the  total  stature  and  sitting  height 
varies  with  little  regularity;  in  the  case  of  the  boys,  this 
ratio  increases  in  variable  amounts  from  6  to  8  years  of 
age;  decreases  to  13  years  and  then  increases.  The  girls 
show  less  irregularity;  the  ratio  rises  from  10  to  16  years 
(with  the  exception  of  the  14  year  group)  though  with  vary- 
ing increments.  This  is  not  in  accordance  with  the  conclu- 
sion reached  by  Boas  that  this  ratio  decreases  as  age  ad- 
vances until  13  years  of  age  for  girls  and  15  for  boys  when 
it  increases  in  each  case.  In  about  45  per  cent  of  the  indi- 
viduals, the  total  height  of  one  child  was  found  to  be  several 
inches  less  than  that  of  another  while  the  length  of  trunk 
of  the  shorter  individual  was  several  inches  greater  than  that 
of  the  taller.  This  variability  was  found  to  occur  between 
the  ages  of  12  and  14  for  both  boys  and  girls,  evidently  the 
period  of  adolescence.  The  coefficient  of  correlation  by  the 
Pearson  formula  between  the  total  stature  and  the  sitting 
height  was  found  to  be  +  .959  for  boys,  and  +.957  for  girls. 

Weight 

The  weight  of  the  individuals  of  this  group  as  measured 
in  pounds  extends  over  greater  range  than  the  height  in 
inches.  Ages  11,  13  ,14,  and  15  show  greatest  difference  be- 
tween individuals  of  same  age  group  for  boys,  and  12,  14, 
and  15  for  girls.  It  is  to  be  noted  that  at  all  ages  the  average 
weight  of  the  girls  range  from  two  to  fifteen  pounds  less 
than  the  normal  with  the  single  exception  of  the  13  year  old 
group  which  surpasses  the  norms  not  only  in  weight  but  in 
the  other  measurements  as  well.  The  heaviest  child  at  all 
ages  except  12  and  15  is  a  boy  while  the  lightest  at  11,  12, 
and  14  is  a  girl.  This  does  not  show  an  exact  parallel  with 
the  tallest  and  shortest  of  these  age  groups.  The  weight  in- 
creases, as  is  the  case  with  the  height,  most  for  the  boys 
from  II  to  12  years  of  age;  and  from  12  to  13  for  the  girls. 
Again  it  must  be  recalled,  however,  that  the  13  year  old 
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group  of  girls  exceeded  the  norms  while  the  12  year  group 
was  subnormal. 

In  plotting  the  weight-height  indices  for  the  age  groups, 
they  are  found  to  be  below  the  Baldwin  norms  for  the  boys 
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in  each  age  group  except  at  12  years;  this  divergence  was 
greater  among  the  girls  than  among  the  boys,  the  exception 
occurring  at  the  13  year  group  which  is  higher  than  the 
norms. 

Grip 

While  in  the  case  of  some  individuals,  the  grip  of  the  left 
hand  was  found  to  be  greater  than  that  of  the  right,  the 
averages  of  the  age  groups  showed  the  right  hand  to  be  the 
stronger.  In  the  case  of  the  thirteen  year  old  boys,  the  grip 
of  the  right  hand  was  the  same  as  at  the  age  of  12  years 
though  the  left  hand  grip  increased  a  pound  and  a  half  from 
12  to  13  years.  Only  in  the  13  year  group  did  the  girls  ex- 
ceed the  boys  in  this  test,  but  this  may  be  explained  by  the 
fact  that  a  girl  of  13  surpassed  all  of  the  other  girls  and 
boys  but  one  in  grip,  thus  raising  the  average  of  the  13  year 
group.    The  boys  increased  in  grip  more  between  the  ages 
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of  eight  and  nine  years,  and  11  and  12  years,  than  at  any 
other  age ;  and  the  girls  showed  the  greater  increase  from  14 
to  15  years.  The  14  year  old  group  of  girls,  however,  was 
found  to  be  sub-normal. 

The  correlations  with  height  were  +  .911,  and  +  .733  for 
boys  and  girls  respectively. 
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The  average  age  of  each  grade  of  this  school  was  found 
to  be  from  one  to  two  and  a  half  years  greater  than  the  nor- 
mal age,  the  girls  being  more  retarded  than  the  boys.  In 
the  average  measurements  for  each  grade,  the  boys  have 
been  only  slightly  below  the  norms  and  in  a  few  cases  above. 
The  girls,  however,  have  been  sub-normal  in  every  case 
with  the  single  exception  of  the  height  of  the  sixth  grade. 
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Lung  Capacity 
On  account  of  the  small  number  of  cases  in  certain  par- 
ticular ages  we  have  given  an  age-grade  distribution  table 
with  the  average  lung  capacity  for  each  year.  In  comparing 
these  norms  with  the  norms  of  Baldwin's  investigation  we 
find  them  as  a  general  rule  lower.  Among  the  boys  from 
five  to  ten  years  our  norm  follows  the  regular  norm.  Among 
the  girls  from  five  to  ten  the  lung  capacity  is  above  the  av- 
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erage.  At  ii  and  12  the  girls  show  a  considerable  decrease 
below  the  norm.  Among  the  boys  for  these  ages  there  is  an 
increase  for  the  nth  year  and  a  decrease  from  the  12th  to 
the  1 6th  year  inclusive.  In  fact  all  the  ages  among  the  boys 
are  below  the  average.  There  is  one  case  only  in  the  i6th 
year,  especially  good,  hence  the  reason  for  this  variance. 

In  the  13th  year  for  the  girls  there  is  a  marked  increase 
over  the  average,  and  from  the  14th  year  to  the  17th  year, 
inclusive,  there  is  a  marked  decrease  in  the  average  lung 
capacity.  By  an  examination  of  the  other  measurements  of 
the  group  of  girls  in  the  13th  year  there  is  found  a  corre- 
sponding increase.  This  corresponds  to  the  results  of  other 
investigations.  Physically  this  is  a  retarded  group  of  boys 
and  girls. 
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THE  APPLICATION   OF  THE  YERKES-BRIDGES 
POINT  SCALE  AND  THE  STANFORD  REVI- 
SION  OF  THE  BINET   SCALE   FOR 
MEASURING  INTELLIGENCE 

Robert  L.  Bates,  Nora  V.  Boston,  S.  M.  Clark,  W.  R.  Flowers, 

Susan  Z.  Housekeeper,  Aimee  Jones,  Rosalie  R.  Martin, 

and  Alice  W.  Ratcliff 

In  order  that  the  reader  may  see  the  comparative  results 
for  all  pupils  tested  by  the  Point  Scale  and  the  Stanford 
Revision  or  Terman  Scale,  the  results  are  given  below  in  tab- 
ulated form.  In  this  table,  G.  signifies  grade ;  C.  age,  chrono- 
logical age;  Pt.  Sc,  Point  Scale;  I.  A.,  coefficient  of  intel- 
lectual ability;  T.  age,  Terman  age;  I.  Q.,  intelligence 
quotient.  According  to  Stanford  Scale  all  children  whose 
I.  Q.  is  70  or  below  are  considered  mentally  deficient.  In 
this  group,  measured  by  the  Point  Scale,  i  boy  and  3  girls 
are  recorded  as  mentally  deficient,  but  measured  by  the  Stan- 
ford Revision  Scale,  i  boy  and  4  girls  are  mentally  deficient. 

The  Yerkes-Bridges  Point  Scale  is  easier  than  the  Stan- 
ford Revision  of  the  Binet  Scale  after  the  "  mental  age  "  of 
eleven  years,  as  may  be  seen  by  Table  4  where  the  individ- 
ual scores  are  given.  The  Point  Scale  results  are  found  in 
mental  ages  by  comparing  an  individual's  total  score  with 
expected  scores  in  norms.  The  Coefficient  of  Intelligence  is 
found  by  dividing  the  actual  score  by  the  expected  score. 
The  results  for  the  Stanford  Revision  Scale  are  scored  in 
terms  of  years  and  months  and  the  Intelligence  Quotient  is 
found  by  dividing  the  mental  age  by  the  chronological  age. 
The  Coefficient  of  Intelligence  and  the  Intelligence  Quotient 
are  not  directly  comparable  and  must  not  be  confused. 

In  the  Demonstration  School,  129  children  were  given  in- 
dividually the  Stanford  Revision  of  the  Binet  Scale  for  rat- 
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ing  intelligence  and  the  Yerkes-B ridges  Point  Scale.  The 
individual  measures  are  given  in  Table  4.  The  general  re- 
sults of  these  examinations  are: 

Stanford  Revision  Yerkes-Bridges 

Retarded      6  years 2  2 

5  years i  i 

4  years 7  3           c 

3  years 8  5 

2  years 21  7 

1  year 2^  13 

Normal                      35  30 

Accelerated  i  year 15  30 

2  years 10  17 

3  years 3  5 

4  years i  10 

5  years o  3 

6  years 0  i 

Not  tested  by  Yerkes 

Scale  2. 


ACC  ELERATED 


NORMAL 


RETARDED 


i  Joa^olJU^  ^ 


CHART   X.*— A  Comparison  of  the  Two  Scales  in  Estimating 

THE  Mental  Ability  of  70  Boys 

Point  Scale Terman 


ACCELERATED 


NORMAL 


RETARDED 


CHART  XL* — A  Comparison  of  the  Two  Scales  in  Estimating 

THE  Mental  Ability  of  6i  Girls 

Point  Scale Terman 
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TABLE  4 

DMPARISON  OF  MEASURES  BY  THE  PoiNT  SCALE  AND  THE  STANFORD  REVISION  SCALE^ 


In 

Boys 

No. 

Girls 

0. 

G. 

C.Age 

Pt.  Sc. 

l.A. 

T.Age|l.Q. 

G. 

C.Age 

Pt.  Sc. 

l.A. 

T.Age 

I.Q. 

I 

I 

6 

6 

103 

M 

106 

I 

I 

5-3 

6 

123 

6-2 

117 

2 

I 

6-2 

8 

144 

7-8 

124 

2 

2 

6-6 

9 

165 

7-4 

113 

3 

I 

6-8 

8 

128 

7-2 

108 

3 

2 

7-8 

91 

7-6 

98 

4 

2 

6-8 

8 

154 

7-6 

113 

.  4 

2 

8-10 

8 

100 

7-10 

89 

5 

I 

6-9 

6 

100 

6-4 

94 

5 

4 

8-1 1 

14 

153 

1 1-3 

126 

6 

2 

7-1 

9 

148 

8-4 

118 

6 

2 

9-5 

8 

74 

8-4 

88 

7 

2 

7-5 

8 

141 

9-4 

126 

7 

4 

lO-I 

9 

96 

9-10 

98 

8 

2 

7-1 1 

6 

84 

6-8 

84 

8 

4 

10-3 

8 

86 

9-2 

89 

9 

2 

8 

4 

36 

4-8 

58 

(9) 

4 

10-3 

8 

69 

8-10 

86 

10 

4 

9-3 

10 

117 

10-3 

III 

10 

5 

10-5 

12 

121 

II-IO 

114 

II 

4 

9-5 

10 

1 10 

9-10 

104 

II 

5 

10-9 

II 

109 

10-3 

95 

12 

4 

9-5 

II 

122 

10-2 

108 

12 

4 

lO-II 

II 

108 

1 1-2 

102 

13 

5 

9-9 

12 

129 

ii-io  121 

13 

5 

lO-II 

12 

118 

1 1-9 

108 

14 

4 

10-2 

9 

98 

9-10 

97 

14 

5 

1 1-2 

12 

113 

10-9 

96 

15 

4 

10-2 

14 

132 

1 1-3 

III 

15 

4 

1 1-4 

II 

103 

1 1-9 

104 

16 

5 

10-2 

15 

146 

12-7 

124 

16 

4 

1 1-4 

9 

84 

9 

79 

17 

5 

10-3 

14 

130 

1 1-4 

III 

17 

6 

1 1-4 

15 

126 

lO-IO 

96 

18 

4 

10-4 

12 

124 

1 1-9 

114 

18 

5 

1 1-5 

15 

130 

1 1-8 

102 

19 

4 

10-8 

14 

126 

12-3 

115 

19 

4 

1 1-6 

10 

9? 

10-8 

93 

20 

5 

lO-II 

15 

134 

1 1-8 

107 

20 

4 

1 1-7 

II 

98 

1 1-3 

97 

21 

5 

lO-II 

15 

134 

12-4 

113 

(21) 

6 

1 1-8 

13 

108 

12-3 

105 

22 

4 

II 

II 

106 

9-1 1 

90 

22 

4 

II-IO 

II 

94 

1 1-4 

96 

23 

5 

II 

II 

lOI 

10-10 

98 

23 

4 

12-2 

14 

114 

1 2-1 

99 

24 

4 

ii-i 

II 

104 

10-10 

98 

24 

6 

12-2 

14 

105 

12-4 

lOI 

25 

4 

ii-i 

12 

118 

10-10 

97 

25 

6 

12-2 

14 

112 

II-IO 

97 

26 

6 

1 1-2 

14 

131 

13-6 

121 

26 

6 

12-4 

13 

102 

II-IO 

96 

27 

4 

1 1-3 

II 

103 

9-2 

81 

27 

6 

12-6 

II 

90 

10-8 

85 

28 

5 

1 1-3 

ID 

95 

lO-I 

90 

28 

4 

12-8 

10 

82 

9-4 

74 

29 

5 

1 1-3 

16 

134 

14 

124 

29 

4 

12-8 

II 

89 

II-I 

88 

30 

6 

1 1-3 

Adult 

148 

15-2 

135 

30 

6 

12-8 

15 

113 

II-I  I 

94 

31 

4 

1 1-4 

15 

129 

13-6 

119 

31 

7 

12-8 

15 

113 

13-5 

106 

32 

4 

1 1-4 

II 

98 

II-I 

98 

32 

6 

1 2-1 1 

II 

99 

1 1-8 

90 

33 

6 

1 1-4 

15 

126 

14-3 

126 

33 

6 

1 3-1 

10 

n 

10 

76 

34 

5 

1 1-5 

14 

121 

1 1-8 

102 

34 

6 

13-4 

14 

105 

14-2 

106 

35 

5 

1 1-5 

13 

no 

12-8 

III 

35 

5 

13-5 

14 

lOI 

10-9 

80 

36 

5 

12 

13 

104 

10-3 

85 

36 

7 

13-5 

13 

100 

II-I 

^\ 

37 

5 

12 

15 

113 

lO-II 

91 

37 

7 

13-6 

14 

104 

13-3 

98 

38 

6 

12 

15 

115 

13-9 

114 

38 

4 

13-7 

14 

lOI 

12-6 

92 

39 

5 

12-4 

12 

100 

II-IO 

96 

39 

6 

13-8 

14 

no 

12 

88 

40 

7 

12-4 

14 

109 

12-3 

99 

40 

7 

13-8 

15 

114 

12-8 

?^ 

41 

7 

12-5 

Adult 

121 

15-5 

124 

41 

7 

13-8 

15 

no 

II 

80 

42 

4 

12-6 

10 

85 

10-9 

86 

42 

4 

14 

8 

50 

9-3 

66 

43 

6 

12-9 

II 

86 

12-4 

97 

43 

4 

14 

9 

70 

9-9 

70 

44 

5 

12-10 

14 

106 

1 1-9 

92 

44 

6 

14 

14 

lOI 

lO-I 

72 

45 

6 

1 2-1 1 

15 

113 

12-4 

95 

45 

7 

14 

15 

109 

12-2 

87 

46 

6 

13 

II 

90 

1 1-2 

86 

46 

5 

14-1 

14 

99 

13-7 

96 

47 

4 

13-3 

II 

90 

ia-2 

77 

47 

6 

14-1 

14 

100 

12-3 

87 

48 

7 

13-8 

15 

113 

II-IO 

87 

48 

6 

14-1 

13 

98 

1 1-9 

83 

49 

6 

13-9 

14 

109 

1 1-7 

84 

49 

7 

14-3 

15 

113 

13-9 

96 
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TABLE  4.— Continued 

No 

Boys 

No. 

Girls 

XMO. 

G. 

C.Age 

Pt.  Sc. 

I.  A. 

T.Age 

I.Q. 

G. 

C.Age 

Pt.  Sc. 

I.  A. 

T.Age 

I.Q 

50 

6 

13-^ 

15 

115 

13-8 

99 

50 

5 

14-4 

II 

84 

9-10 

6( 

51 

6 

13-10 

14 

102 

13-2 

95 

51 

6 

14-5 

II 

92 

1 1-2 

7' 

52 

6 

14 

15 

109 

13-6 

96 

52 

7 

14-5 

13 

95 

12-9 

8{ 

53 

7 

14 

13 

97 

1 3-1 1 

99 

53 

7 

1 4-1 1 

14 

97 

II-IO 

7< 

54 

7 

14 

Adult 

117 

12-10 

92 

54 

6 

15 

14 

96 

12-6 

8: 

55 

6 

1 4-1 

14 

lOI 

13 

92 

55 

6 

15 

12 

87 

1 1-3 

7^ 

56 

5 

14-2 

16 

116 

14 

99 

56 

7 

15-4 

Adult 

109 

14-11 

9; 

57 

6 

14-2 

14 

105 

1 2-1 1 

91 

57 

7 

15-4 

Adult 

103 

13-9 

9C 

58 

6 

14-3 

16 

117 

II-IO 

83 

58 

7 

15-6 

II 

83 

II-IO 

7( 

59 

5 

14-4 

14 

102 

12-6 

87 

59 

7 

16 

15 

93 

12-6 

7^ 

60 

7 

14-5 

15 

105 

12-7 

87 

60 

7 

16-I 

12 

8i 

9-10 

61 

61 

7 

14-9 

12 

89 

12-9 

86 

61 

7 

16-6 

10 

77 

10-9 

6! 

62 

7 

14-10 

X 

X 

12-2 

82 

63 

7 

1 4-1 1 

15 

109 

13-2 

88 

64 

7 

1 4-1 1 

16 

113 

13 

87 

f? 

5 

15 

15 

99 

10-8 

71 

66 

7 

15-3 

15 

99 

13-10 

91 

67 

7 

15-4 

X 

X 

14-3 

93 

68 

7 

15-6 

14 

92 

12-10 

83 

69 

7 

15-6 

15 

lOI 

14-8 

95 

70 

6 

1 6-1 

15 

91 

13-10 

86 

♦  In  the  accompanying  graphs  retarded  means  that  the  mental  age 
is  a  year  or  more  below  the  chronological  age;  normal,  that  the 
mental  age  and  chronological  age  are  less  than  a  year  apart;  ac- 
celerated, that  the  mental  age  is  a  year  or  more  above  the  chrono- 
logical age. 

It  will  be  seen  in  the  case  of  both  boys  and  girls  that  the  two 
scales  agree  rather  closely  in  designating  normal  children,  but  that 
they  are  far  apart  in  the  matter  of  retarded  and  accelerated  pupils, 
especially  in  the  upper  grades.  The  Point  Scale,  for  example,  has 
only  2  boys  and  4  girls  retarded  in  grade  7,  while  the  Terman  Scale 
has  10  boys  and  11  girls  retarded  in  that  grade.  Moreover,  the 
Point  Scale  has  6  boys  and  6  girls  accelerated  in  grade  7,  while  the 
Terman  Scale  shows  i  boy  and  no  girls  accelerated. 

By  general  consent  grade  7  was  a  very  slow  class,  containing 
many  over-aged,  backward  pupils.  The  Terman  results  approximate 
closely  the  actual  conditions  of  mentality.  This  would  indicate 
that  the  Point  Scale  is  too  easy  for  the  upper  grades.    W.  R.  F. 

^The  estimated  number  of  months  for  the  Point  Scale  age  was 
taken  into  consideration  when  finding  the  I.  A. 


IV 

APPLICATION  OF  THE  COURTIS  STANDARD 
RESEARCH  TESTS  IN  ARITHMETIC- 
SERIES  B 

Alice  K.  Bielaski  and  George  Lloyd  Palmer 

The  Courtis  Standard  Research  Tests  in  Arithmetic  are 
intended  to  measure  ability  in  the  four  fundamental  opera- 
tions with  integers.  Too  frequent  use  of  them  is  not  recom- 
mended as  they  are  "neither  examinations  nor  teaching 
devices."  To  measure  the  progress  made  in  a  school  year 
under  any  system  of  instruction,  it  is  well  to  use  the  tests  at 
the  opening  of  the  session,  at  the  mid-year  and  at  the  close 
of  the  year.  Four  forms  of  this  series  may  be  obtained  and 
a  different  form  should  be  used  for  each  of  the  three  trials. 
The  forms  are  of  equal  difficulty  and  therefore  the  choice 
between  them  is  an  arbitrary  one. 

There  is  one  set  of  tests  for  all  grades  because  it  is  be- 
lieved that  true  mental  progress  is  best  revealed  by  increased 
facility  in  the  use  of  the  same  material,  just  as  physical 
growth  is  shown  by  the  changes  in  the  results  obtained  by 
the  use  of  the  same  measuring  scales. 

Test  one  consists  of  twenty- four  addition  examples,  each 
containing  nine  addends  of  three  digits ;  test  two  consists  of 
twenty-four  subtraction  examples  containing  numbers  of 
eight  or  nine  digits ;  test  three  consists  of  twenty  multipli- 
cation examples,  each  multiplicand  containing  four  digits 
and  each  multiplier,  two  or  three;  test  four  consists  of 
twenty-four  division  examples,  each  dividend  containing 
four  or  five  digits  and  each  divisor,  two.  The  digits  are  ar- 
ranged in  such  a  way  that  all  the  fundamental  combinations 
are  represented.  Eight  minutes  are  allowed  for  addition, 
four  for  subtraction,  six  for  multiplication,  and  eight  for 
division. 

Full  instructions  for  giving,  scoring,  and  tabulating  are 
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found  in  the  folders  accompanying  the  tests.  The  record 
sheet  furnishes  a  convenient  device  for  obtaining  the  class 
standing  in  terms  of  median  speed  and  accuracy,  as  well  as 


Courha  Test 

C 

99 

r/es 

97 

95 

/07 

9Z 

DistTiburion  Chart 

Gmde  7-Tnali 

»3 

m 

100 

in 

103 
119 

la 
foz 
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CHART    XII. — Distribution  of  Scores,  Courtis  Arithmetic 
Tests,  Series  B 


the  percentage  of  efficiency.  By  comparison  with  the  stand- 
ard median  scores  in  speed  and  accuracy,  the  proficiency  of 
a  grade  may  be  determined. 

The  results  of  the  tests  as  given  in  the  Johns  Hopkins 
Demonstration  School  are  presented  in  Table  5. 

In  the  first  trial  the  4th  was  the  only  grade  proficient  in 
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all  of  the  four  fundamental  operations.  All  the  other  grades 
were  noticeably  deficient  in  addition,  the  6th  grade  showing 
practically  no  advance  in  ability  over  the  5th  grade.  How- 
ever, all  the  grades  were  proficient  in  division.  The  lowest 
general  standing  was  that  of  the  6th  grade  in  which  five  of 
the  eight  deviations  were  negative. 

In  the  4th,  5th  and  6th  grades,  the  results  of  the  second 
trial  did  not  show  the  expected  advance  in  ability,  but  this 
may  be  accounted  for  partially  by  the  fact  that  the  tests  were 
given  on  a  very  hot  and  sultry  day.  In  the  7th  grade  where 
the  second  trial  was  given  on  a  cooler  day  and  under  more 
favorable  conditions,  there  was  a  marked  improvement  in 
all  four  operations.  It  is  believed  that  under  similar  cir- 
cumstances, the  tests  given  in  the  other  grades  would  have 
shown  satisfactory  advancement. 

On  the  whole,  it  would  seem  that  subtraction  and  division 
are  better  learned  than  addition  and  multiplication  and  that 
these  two  latter  processes  should  receive  more  emphasis  in 
teaching.  But  it  may  be  well  to  consider  whether  the  addi- 
tion and  multiplication  tests  are  well  standardized.  Would 
examples  containing  fewer  addends  furnish  a  better  scale  of 
measurement  in  addition? 

The  Courtis  definition  of  efficiency  is  somewhat  mislead- 
ing. To  be  efficient  a  pupil  must  be  100  per  cent  accurate 
and  maintain  a  speed  equal  to,  or  greater  than,  the  standard 
speed.  This  gives  no  credit  to  one  who  has  exceeded  the 
standard  in  attempts  and  rights,  but  has  fallen  below  100 
per  cent  in  accuracy.  E.g.,  in  test  No.  2,  Subtraction,  there 
was  one  pupil  in  the  5th  grade  who  correctly  solved  twenty- 
two  out  of  twenty-four  examples.  The  standard  score  in 
this  test  is  9  —  9,  and  if  the  pupil  had  attempted  9  examples 
and  had  solved  them  correctly,  he  would  have  been  efficient, 
but  having  attempted  twenty-four  and  solved  but  twenty-two 
correctly,  he  falls  below  efficiency.    Is  this  grading  fair  ? 

These  tests  are  valuable  tools  for  making  rough  measure- 
ments. Tests  consisting  of  carefully  graded  examples  are 
finer  instruments  for  diagnosing  individual  ills  and  hence 
furnish  a  better  guide  to  the  teacher  in  selecting  a  remedy. 


RESULTS  IN  ARITHMETIC  BY  WOODY 
SCALE  "A" 


W.  H.  Davis  and  R.  L.  Clark 

The  Woody  Arithmetic  Scale,  Series  A,  developed  in 
191 5-16  by  Clifford  Woody  from  about  20,000  test  sheets 
of  pupils  in  seven  different  school  systems  in  Indiana,  New 
Jersey,  Connecticut,  and  New  York,  aims  to  test  pupils, 
classes,  schools,  and  systems  in  accuracy  in  the  four  fun- 
damental operations. 

The  four  tests  contain  148  problems.  Those  of  each  test 
are  of  increasing  difficulty  and  each  problem  is  of  fixed 
value.  The  progress  from  grade  to  grade,  therefore,  can  be 
definitely  determined.  Twenty  minutes  is  allowed  for  each 
test,  and  absolute  accuracy  is  the  basis  for  scoring.  The 
judgment  of  the  scorers  is  thus  eliminated. 

The  scale  was  used  by  the  writers  in  testing  the  fourth, 
fifth,  sixth,  and  seventh  grades  in  the  Demonstration  School 
of  Johns  Hopkins  University  in  July,  191 7.  The  pupils  in 
this  school,  in  general,  failed  of  promotion  during  the  pre- 
ceding year  and  were  attending  the  school  to  prepare  for  the 
next  grade.  This  condition  and  the  fact  that  Baltimore  al- 
lots an  unusual  amount  of  time  to  Arithmetic  would  not 
warrant  the  expectation  of  class  medians  and  scores  higher 
than  the  standards  derived  by  Woody  for  grades  in  the 
middle  of  the  school  year. 

From  the  tabulations  of  the  original  data,  showing  the 
particular  and  total  problems  solved  by  each  pupil,  and  the 
number  of  pupils  who  solved  each  problem,  were  obtained 
the  following  distribution  tables,  medians,  and  scores. 
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Comparison  of  Class  and  Standard  Medians 


Class 

Addition 
Standard 

Deviation 

Class 

Subtraction 
bt&ndard 

Deviation 

Grade 
Grade 
Grade 
Grade 

IV  . 

V  . 

VI  . 
VII. 

..20.9 

,..28.5 
...29.5 
..32 

18.3 

29.7 
32.4 

+2.6 

+5.5 

—  .2 

—  .4 

19.7 
24.7 
28.6 
30.6 

15.7 
20.4 
24.9 
28.5 

+4 

+4.3 

+3.7 

+2.1 

Class 

MultiplicatioE 
Standard 

Deviation 

Class 

Division 
Standard 

Deviation 

Grade 
Grade 
Grade 
Grade 

IV  . 

V  . 

VI  . 
VII. 

..17.5 
..23 
..29.3 
..32.5 

II 
18.3 
26.1 
30.6 

+6.5 
+4.7 
+3.2 
+1.9 

15.5 
22.5 

26.9 

9.9 
16.5 
23.8 
27.4 

+5.6 

+6 
+2.1 
—  .5 

Comparison  of  Class  and 

Standard  Scores 

Class 

Addition 
Standard 

Deviation 

Class 

Subtraction 
Standard 

Deviation 

Grade 
Grade 
Grade 
Grade 

IV  . 

V  . 

VI  . 
VII. 

..6.76 

'•7-77 
..8.20 
..8.17 

6.1 1 
6.99 
7.9s 
8.65 

+  .65 
+   .78 

-.48 

4.95 
6.82 
7.04 

8.01 

4.22 

7.31 

+  '73 
+1.35 
+  .58 
+  .70 

Class 

Multiplication 
Standard 

Deviation 

Class 

Division 
Standard 

Deviation 

Grade 
Grade 
Grade 
Grade 

IV  . 

V  . 

VI  . 
VII. 

..S.44 

..6.10 
..7.28 
..7.16 

4.0s 
5.53 
6.72 
7.26 

+1.39 
+  .57 
+   .56 
—  .10 

4.65 

5.42 

6.II 
6.51 

3.21 

tit 

6.59 

+1.44 
+  .48 
+   .24 
—  .08 

The  class  score  is  the  value  of  that  problem  which  50  per 
cent  of  the  class  can  solve  correctly  and  is  obtained  by  get- 
ting the  average  value  of  the  five  problems  which  came  near- 
est to  being  solved  by  50  per  cent  of  the  class.  Thus  in  sub- 
traction in  grade  IV: 


Problem 

Per  Cent.  Solving 

Est.  Value 

Correction 

Value 

15 

5"^ 

3.70 

+ 

.15 

=          3.85 

17 

65 

4.41 

+ 

.57 

=          4.98 

18 

69 

4.42 

+ 

.74 

=          S.16 

19 

38 

5.i8 

.45 

=          4.73 

22 

57 

5.57 

+ 

.26 

=       6.01 
5)24.73 

Class  Score      4.95 

Conclusions 
I.  The  distribution  sheets  disclose  overlapping  of  grades 


as  indicated  in  the  following  table : 
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Percentage 

OF  Overlap  in  Number  of 

Problems  Solved 

Addition 

Subtraction 

Multiplica- 
tion 

Division 

Above 

Median  of 

Next 

Higher, 
Per  Cent. 

Below 

Median  of 

Next 

Lower, 
Per  Cent. 

Abore, 

Per 
Cent. 

Below, 
Per 
Cent. 

Above, 
Per 
Cent. 

Below, 

Per 
Cent. 

Above, 
Per 
Cent. 

Below, 
Per 
Cent. 

Grade  IV 

Grade  V 

Grade  VI 

Grade  Vir 

0 

48 
38 

14 
35 
21 

0 

15.4 
16.6 

3.8 

6.6 
22.5 

10 
18 
29 

10.4 

3 

22 

6.6 
28.6 
42 

7 
13 

38 

2.  Comparisons  of  class  v^ith  standard  medians  and  scores 
show  that  the  fourth  grade  far  excels  the  standards  and  that 
these  deviations,  decreasing,  in  general,  in  the  succeeding 
grades,  become  negative  deviations  in  two  medians  and  three 
scores  of  the  seventh.  The  facts  that  in  Baltimore  arith- 
metic is  taught  in  the  first  grade  and  that  more  time  is  al- 
lotted to  arithmetic  in  the  first  four  grades  than  in  any  city 
other  than  Cincinnati  would  account  for  the  excellence  of 
the  fourth  grade,  and  the  greater  retardation  in  the  upper 
grades  would  account  in  great  part  for  the  lower  achieve- 
ments of  the  seventh. 

3.  The  practical  use  of  the  scale  is  conditioned  by  the 
amount  of  time  required  to  tabulate  the  results  and  calculate 
the  final  scores. 


VI 

AN  EXPERIMENT  IN  MEASURING  THE  HAND- 
WRITING OF  A  GROUP  OF  CHILDREN  FOR 
SPEED  AND  QUALITY 

William  R.  Flowers 

One  hundred  eighteen  children  were  included  in  this  sur- 
vey of  hand-writing  measured  for  speed  and  quality,  com- 
prising grades  4,  5,  6,  and  7. 

Adopting  the  plan  of  Starch  in  his  "  Educational  Meas- 
urements," each  pupil  was  given  exactly  two  minutes  in 
which  to  write  as  often  as  possible  the  sentence  "  Mary  had 
a  little  lamb,"  in  order  to  find  the  number  of  letters  written 
per  minute.  Immediately  following  this  test,  each  pupil 
wrote  a  paragraph  from  dictation,  the  vocabulary  of  which 
was  simple  enough  for  even  the  youngest.  This  was  the  test 
for  quality.  The  writing  was  done  on  lined  paper  of  uni- 
form size,  with  pen  and  ink.  Nothing  was  said  to  the  pupils 
to  urge  unusual  speed  or  unusual  carefulness. 

In  measuring  the  quality  three  scales  were  used ;  viz.  the 
Ayres,  the  Thomdike,  and  the  Freeman.  Each  specimen 
was  graded  first  by  the  Ayres  scale  and  the  number  re- 
corded so  that  the  examiner  could  not  see  it,  when,  twenty- 
four  hours  later,  the  same  specimen  was  graded  by  the 
Thorndike  scale.  After  another  interval  of  twenty-four 
hours  each  specimen  was  again  graded  by  the  Freeman 
scale,  five  points  of  judgment  being  recorded  for  the  last 
scale — uniformity  of  slant,  uniformity  of  alignment,  quality 
of  line,  letter  formation,  and  spacing.  The  Ayres  scale 
grades  the  quality  on  a  percentage  basis  from  twenty  to 
ninety.  The  Thorndike  scale  grades  the  quality  by  a  series 
of  numbers  from  4  to  18.  The  Freeman  scale  grades  the 
quality  by  giving  a  number  from  i  to  5  for  each  of  the  above 
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mentioned  five  characteristics,  the  total  of  these  marks,  5  to 
25,  being  the  final  rating.  For  convenience  of  comparison 
with  the  Ayres  scale  each  of  the  others  has  been  changed  to 
a  percentage  basis. 


QUAury 

so 

/f£y 

THORNDIKE 

AYRES 

FREEAMA/ 

TEACHER 

/ '/  / 

^  •••.. 

^s^^^ 

■/ 

¥                J-              ^              7 
CM&EARING  RATWffS  A 

r  ravRGairsm 

CHART  XIV 


After  the  three  measuring  scales  had  been  used,  the  grade 
teacher  was  asked  to  put  her  estimate  on  her  pupil's  papers, 
without,  however,  using  any  measuring  scale. 

The  following  summary  gives  for  each  grade  the  average 
rating  for  quality  by  each  of  the  preceding  four  criteria : 


Grades 

Ayres  Scale 

Thomdike 

Freeman 

Teacher's  Esti- 
mate 

4 
5 
6 

7 

47.6 
43.9 
53-9 
530 

44.0 
41.0 
51.0 
51.0 

46.4 
46.1 
50.9 
56.3 

49.0 
47.5 
52.5 
56.7 
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It  will  be  seen  from  the  above  ratings  that  the  Thorndike 
scale  gives  the  lowest  rating  in  each  grade  except  the  sixth ; 
that  the  average  of  the  Ayres  and  Freeman  ratings  is  al- 
most the  same  (49.6  for  Ayres,  50.2  for  Freeman)  ;  and 
that,  except  in  grade  6,  the  teacher's  estimate  is  higher  than 
that  obtained  by  applying  any  scale.  In  graphic  form  these 
results  are  shown  thus : 
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The  accompanying  graphs  (charts  XV  and  XVI)  com- 
pare the  average  quality  of  the  classes  by  the  Thorndike 
scale  with  the  standard  curve  for  that  scale  as  given  by 
Starch.  It  will  be  noticed  that  only  grade  five  falls  below, 
and  that  grades  four  and  six  are  above. 

In  the  summary  it  will  be  noted  that  there  was  not  a  very 
wide  difference  in  the  average  quality  of  the  writing  of  any 
grade  as  compared  with  the  grade  above  or  below  it.  This 
is  a  usual  result,  and  is  shown  more  strikingly  by  the  fol- 
lowing distribution  charts. 
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The  Freeman  scale  was  chosen  as  the  basis  of  this  com- 
parison, because  the  experimenter  thought  that  the  detailed 
criticism  of  quality  of  writing  afforded  by  that  scale  would 
be  the  fairest  criterion.  It  is  probable  that  the  same  over- 
lapping would  occur  with  any  other  scale. 
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CHART  XVI. — Starch's  Standard  Compared  with  the  Demon- 
stration School 

Overlapping  in  speed  also  occurs,  there  being,  for  ex- 
ample, 7  pupils  in  grade  4,  12  in  grade  5,  10  in  grade  6,  and 
II  in  grade  7  all  writing  at  the  rate  of  45-54  letters  per 
minute.    This  is  34  per  cent  of  the  whole  number  tested. 


Correlations 

The  inquiry  naturally  presents  itself  as  to  the  correlation 
between  quality  and  speed.  As  might  be  expected  there  is  a 
negative  correlation  of  — .38,  with  probable  error  of  .07; 
i.  e.,  the  higher  the  quality  the  lower  the  speed. 

Another  interesting  question  is  the  correlation  between 
the  teacher's  estimate  and  the  experimenter's.  In  grade  6 
the  teacher  used  four  of  Freeman's  five  points  of  judgment 
(without  a  knowledge  of  Freeman's  scale),  and  the  correla- 
tion for  that  scale  between  her  estimate  and  the  experi- 
menter's   (by  the   Spearman   formula)    was    + -S?-     The 
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teacher  of  grade  four  was  familiar  with  the  Ayres  scale, 
and  the  correlation  between  her  estimate  and  the  experi- 
menter's was  +  .62. 

The  teachers  of  grades  5  and  7  had  never  used  any  meas- 
uring scale.  In  grade  5  the  correlation  between  the  teach- 
er's estimate  and  the  experimenter's  by  the  Ayres  scale  was 
4-  .47 ;  between  the  teacher's  estimate  and  the  Freeman 
scale,  +  -50-  In  grade  7  the  correlation  between  the  teach- 
er's estimate  and  Ayres  scale  and  between  teacher's  estimate 
and  Freeman  scale  was  the  same,  +  .47. 

Criticism  and  Conclusions 

I.  Quality  of  handwriting  is  very  difficult  to  measure  ac- 
curately by  any  scale  in  use  at  present. 

II.  The  Ayres  scale  is  difficult  to  apply  because  it  does  not 
contain  sufficient  variety  of  specimens,  and  because  it  passes 
a  composite,  rather  than  a  detailed,  judgment.  It  essays  to 
measure  legibility.  Even  if  it  does  this  (which  is  doubtful) 
few  teachers  are  satisfied  with  writing  that  is  merely  legible. 

III.  The  Thomdike  scale  is  superior  to  the  Ayres  in  that 
it  contains  more  specimens.  On  the  other  hand,  it  seems  to 
the  writer  unfortunate  that  the  specimens  are  not  distributed 
more  equally  among  the  various  qualities  (quality  10,  e.  g., 
having  but  one  illustration),  and  that  the  system  of  grading 
is  so  inconvenient.  One  wishes  that  Professor  Thomdike 
would  arrange  a  new  scale  with  the  above  difficulties  elim- 
inated. 

IV.  The  Freeman  scale  seems  the  most  rational  because 
it  itemizes  the  characteristics  of  good  and  bad  writing  and 
judges  each  separately.  Its  judgment  is  detailed  and  spe- 
cific, not  composite.  It  is  also  most  practical  in  pointing 
out  to  pupils  exactly  the  faults  in  their  writing.  Value 
would  be  added  to  this  scale  if  five  grades  of  quality  instead 
of  three  were  given  and  more  specimens  in  each  grade  in- 
cluded. 

V.  As  shown  by  the  coefficients  of  the  correlation  in 
grades  4  and  6  on  the  one  hand  and  grades  5  and  7  on  the 
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Other,  some  measuring  scale  is  better  than  none  in  assisting 
to  a  uniform  grading  of  quality  of  writing. 

VI .  In  view  of  the  amount  of  overlapping  in  quality  as 
shown  in  the  distribution  charts,  it  seems  desirable  to  have 
grade  measuring  scales  instead  of  one  scale  for  all  grades. 
Under  such  a  plan  a  pupil  in  any  grade  who  reaches  the 
maximum  of  quality  for  his  grade  could  be  excused  from 
further   formal  drill  in  writing,   unless   his  writing  dete- 
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riorated.  This  would  recognize  individual  differences  and 
enable  those  with  capacity  for  good  writing  to  stress  some 
other  subject  in  which  they  might  be  deficient.  This  plan 
has  been  followed  for  the  past  two  years  with  good  results 
in  the  school  with  which  the  writer  is  connected. 

VII.  Since  it  is  not  enough  merely  to  write  well,  reason- 
able speed  also  being  demanded,  it  is  desirable  in  all  scales 
to  combine  quality  and  speed  and  give  one  rating  to  include 
hoth.    This  has  not  yet  been  satisfactorily  worked  out. 


VII 

THE  KANSAS  SILENT  READING  TEST 

Mary  O.  Ebaugh 

The  Kansas  Silent  Reading  Test  is  designed  to  measure 
the  ability  "  to  interpret  the  meaning  of  sentences  and  para- 
graphs." The  two  factors — speed  and  comprehension — are 
combined  in  a  single  mark,  and  the  child's  ability  to  read  is 
measured  by  the  number  of  reading  exercises  which  he  can 
comprehend  accurately  within  a  given  time. 

The  test  includes  three  sets  of  exercises — one  for  grades 
3,  4  and  5 ;  one  for  grades  6,  7  and  8 ;  and  one  for  grades  9, 
10,  II  and  12.    The  first  exercise  for  grade  3  is  as  follows: 

"  I  have  red,  green  and  yellow  papers  in  my  hand.  If  I 
place  the  red  and  green  papers  on  the  chair  which  color  do 
I  still  have  in  my  hand  ?  '* 

The  last  exercise  in  the  last  set  is : 

"At  sea  level  water  boils  at  212  degrees  above  zero  on 
the  Fahrenheit  thermometer,  and  at  100  degrees  above  zero 
on  the  Centigrade  thermometer.  The  zero  point  on  the  Cen- 
tigrade thermometer  represents  the  same  temperature  as  32 
degrees  on  the  Fahrenheit  thermometer.  A  change  in  tem- 
perature which  would  raise  the  mercury  in  a  Centigrade 
thermometer  5  degrees  would  raise  the  mercury  in  a  Fahren- 
heit thermometer  how  many  degrees  ?  " 

Each  exercise  contains  not  less  than  15  words;  few  con- 
tain more  than  60.  Each  is  supposed  to  be  subject  to  only 
one  interpretation  and  to  call  for  but  one  thing  so  that  what 
the  child  does  in  response  to  it  will  be  wholly  right  or  wholly 
wrong.  Each  is  so  planned  as  to  reduce  written  interpreta- 
tion to  a  minimum  so  as  not  to  confuse  ability  to  get  mean- 
ing with  ability  to  reproduce  meaning. 

The  value  of  each  exercise  indicates  the  relative  length  of 
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time  required  on  the  average  by  children  of  a  certain  grade 
to  answer  the  exercise  correctly. 

Some  of  the  exercises  are  short  and  easy  to  remember; 
others  are  more  difficult.  Some  require  direct  reasoning, 
while  others  are  of  the  nature  of  a  puzzle. 

The  answers  indicate  in  many  cases  that  the  pupils  fail 
to  answer  correctly  although  apparently  they  comprehend 
the  statement.  Since  the  test  is  not  designed  as  a  memory 
test,  a  test  in  reasoning,  or  a  test  in  solving  problems,  but  a 
test  in  which  "the  difficulty  of  each  exercise  must  depend 
upon  the  child's  interpreting  the  English  language,"  the  dif- 
ficulties connected  with  memorizing,  reasoning  or  the  solu- 
tion of  problems  should  be  kept  as  far  as  possible  on  an 
equal  plane  and  difficulties  in  vocabulary  or  in  construction 
should  be  the  basis  upon  which  the  increased  difficulty  of 
one  exercise  over  another  should  depend. 

Revision  of  a  few  of  the  exercises,  which  are  not  stated 
clearly,  would  add  value  to  the  scale.  It  is  wrong  to  rank 
answers  indicating  partial  comprehension  in  the  same  way 
as  those  indicating  no  comprehension  at  all.  Furthermore, 
the  test  would  be  much  more  valuable  if  it  had  been  planned 
to  reveal  specific  causes  of  strength  or  weakness  in  each  in- 
dividual effort.  It  is  impossible  to  tell  whether  low  scores 
are  due  to  slowness  or  to  lack  of  comprehension. 

The  test  is  definite  and  practical ;  it  takes  a  short  time  to 
give  and  can  be  given  to  large  numbers  at  the  same  time.  In 
spite  of  its  limitations  it  furnishes  instructive  data. 

The  results  of  the  tests  given  in  the  Demonstration  School 
closely  approximate  the  results  established  by  Kelly  in  an 
examination  of  9,252  children  in  19  cities  of  Kansas.  The 
median  for  the  4th  grade  was  .2  higher  than  the  standard; 
the  median  for  the  5th  grade  was  .6  higher  than  the  stand- 
ard; for  the  6th  grade  it  was  .4  lower  and  for  the  7th  1.7 
lower. 

There  was  great  variability  in  ability  among  the  boys, 
among  the  girls,  and  in  the  classes  as  a  whole.  In  each  class 
the  average  and  median  scores  for  the  boys  were  higher 
than  those  for  the  girls. 
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In  the  4th  grade  40  per  cent  of  the  boys  and  60  per  cent 
of  the  girls  were  below  the  class  median ;  in  the  5th  grade 
47.3  per  cent  of  the  boys  and  55.5  per  cent  of  the  girls  were 
below  the  class  median ;  in  the  6th  grade  40  per  cent  of  the 
boys  and  56.3  per  cent  of  the  girls  were  below  the  class 
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median;  in  the  7th  grade  42.1  per  cent  of  the  boys  and  55.5 
per  cent  of  the  girls  were  below  the  class  median. 

The  overlapping  of  grades  was  very  noticeable  particularly 
in  the  6th  and  7th  grades.  More  than  45  per  cent  of  the  6th 
grade  made  a  score  higher  than  the  median  score  of  the  7th 
grade.    More  than  40  per  cent  of  the  7th  grade  made  a  score 
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lower  than  the  median  score  of  the  6th  grade.  20  per  cent 
of  the  4th  grade  made  a  score  higher  than  the  median  score 
of  the  5th  grade,  and  14  per  cent  of  the  5th  grade  made  a 
score  lower  than  the  median  score  of  the  4th  grade.  The 
5th  and  6th  grades  could  not  be  compared  because  different 
tests  were  given  in  these  two  grades. 


CHA/^ 

vr 

i 

Fourth  Grade        s 

• 

" 

[ 

1 

- 

± 

- 

_ 

_ 

- 

No  Problems        o   i  t  i  *   i 


6    »    M  II   a  >i  I* 


n  /«  i»  »  II  it  is  t^  gs  it  *7  ge  ts  io  ii  3t 


nrth  Cracfe 


— --^-":-- -----■— 

'1        ^ 

:i::ii:i:if :;:::::==::===::::=;=: 

__:_:_:.  ±::::± :::_:: : 

No  Problems     ^o    /2s«$«7«s    »  u  n  n  >*  a  it  n  lo  tt  toti  »»n  »*  u  tt  in  tao  i»  n  u 

i 

^iJdh  Craefe         s 

4 
i 
2 


CHAf 

JT 

TT 

T 

1 

I 

^ 

1 

1 

< 

- 

• 

- 

- 

L 

No  f*robkmi       o   t  z  z  *   t  t  t  o  t  /o  ii  12  13/41516  n  16  n  »oii  n  n  »*  Ktt2iiany>n  m 


Seventh  Grade 


1« 


No.  Prcblemi       o'  1  t   34    5  «   »  «  »   10  u  n  n  n  n  1*  a  i»  »  zo  ti  zt  a  Z4  xs  ts  m6  Z9  mji  34^ 

CHART   XIX. — Silent  Reading.     Charts  Showing  Overlapping 

OF  Grades 


The  value  of  the  scores  made  did  not  vary  directly  with 
age.  In  the  test  given  to  the  6th  and  7th  grades  the  highest 
score  was  made  by  the  youngest  child  and  after  passing  the 
normal  age  for  children  in  these  grades  the  average  score 
rapidly  became  lower.  In  the  test  given  to  the  4th  and  5th 
grades  the  average  scores  did  not  show  such  a  steady  de- 
crease for  pupils  above  the  normal  age,  but  showed  great 
variabiHty  though  the  average  score  for  the  highest  age  was 
much  lower  than  that  for  any  other  age. 


VIII 

THE  STARCH  TEST  FOR  SPEED  AND  COMPRE- 
HENSION AND  THE  THORNDIKE  VISUAL 
VOCABULARY  TEST 

Byron  J.  Grimes 
A.  The  Starch  Test  for  Speed  and  Comprehension 

The  reading  test,  Series  A,  used  in  this  survey  measures 
speed  and  comprehension  only.  The  speed  of  reading  is 
measured  by  the  number  of  words  of  a  certain  text  that  can 
be  read  in  one  second.  The  ability  to  reproduce  text  is  ac- 
cepted as  a  measure  of  comprehension.  To  state  this  more 
clearly:  the  number  of  words  written  immediately  after 
reading,  containing  or  reproducing  the  thought  of  the  text 
is  the  measure  of  comprehension. 

The  entire  test  consists  of  nine  pages  of  reading  matter 
suited  to  the  first  nine  school  grades  and  advancing  in  dif- 
ficulty from  one  selection  to  another  by  fairly  uniform  steps. 

Pupils  are  instructed  to  read  for  just  30  seconds  with  as 
much  speed  as  will  permit  of  their  understanding  what  is 
read.  It  must  be  made  clear  before  beginning  to  read  that 
they  will  be  required  to  turn  the  sheet  over  and  write  on  the 
back  as  much  of  the  story  as  they  can  recall.  Each  pupil 
must  begin  and  stop  his  reading  at  exactly  the  same  time. 
No  time  limit  is  set  for  reproduction. 

Illustration:  "Once  upon  a  time  there  was  a  rich  man 
and  a  king  who  had  a  daughter  named  Midas." 

It  has  not  yet  been  clearly  demonstrated  that  the  ability 
to  reproduce  in  writing  what  has  been  read  is  a  fair  estimate 
of  reading  ability.  Oral  reproduction,  while  offering  some 
difficulties  for  the  teacher,  would  simplify  the  process  for 
the  children  tested. 
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The  small  number  of  words  read  by  most  children  would 
seem  to  indicate  that  a  longer  period  than  30  seconds  would 
be  preferable. 

A  summary  of  the  results  of  the  Starch  Reading  test,  in 
median  scores,  is  as  follows : 


Speed 

Devia- 
tion from 
Standard 

Com- 
prehen- 
sion 

Devia- 
tion from 
Standard 

No. 
Words 
Read 

2.1 

2.5 
2.6 
2.1 

-  -3 
~  .3 

-  .6 

-1.5 

29 

28 
24 
32 

+    I 
-   5 
-14 

-13 

64 
81 

86 
62 

Total 
Score 


Fourth  grade . 
Fifth  grade .  .  . 
Sixth  grade .  . . 
Seventh  grade 


27 
32 
28 

29 


It  is  evident  from  the  scores  shown  above  that  the  children 
of  this  school  are  below  standard  in  rate  of  reading  and  also 
in  comprehension. 

This  is  further  evidence  of  the  generally  accepted  theory 
that  slow  reading  and  poor  comprehension  are  closely  related. 
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A  comparison  of  graphs  i  and  2  makes  an  interesting 
study  relative  to  the  seventh  grade.  In  speed  this  grade 
shows  a  decided  falling  off,  but  in  comprehension  a  rela- 
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tively  good  score.  (This  may  be  due  to  larger  experience 
and  wider  reading.)  This  ability  to  interpret  may  be  due 
to  the  enlarged  experience  and  wider  range  of  reading  of 
two  and  three,  even  four,  additional  years  in  school. 


US 
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CHART   XXL — Comprehension 

Difficulty  was  experienced  in  scoring  for  comprehension. 
The  element  of  judgment  enters  so  largely  as  to  permit  of  a 
wide  range  of  variability.  Ideas  got  from  text  read  but 
changed  in  order  or  arrangement  could  not  be  considered 
reproduction. 

B.  Thorndike  Visual  Vocabulary  Test 

The  Thorndike  Visual  Vocabulary  Test  is  an  attempt  to 
measure  silent  reading  so  far  as  it  concerns  the  understand- 
ing of  words  singly,  unconfused  with  the  ability  to  express 
one's  self  orally  or  in  writing. 

The  test  consists  of  9  lines  of  5  words  each,  with  the  ex- 
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ception  of  the  last  line  which  contains  only  3  words.  All 
words  on  the  same  line  are  supposed  to  be  equally  hard  to 
understand  and  the  difficulty  increases  in  equal  amounts 
from  line  to  line  except  that  the  difficulty  between  the  8th 
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CHART  XXIa. — Distribution  of  Scores  for  the  Trabue 
Completion  Test. 


line  and  the  line  preceding  and  following  is  only  half  as 
great  as  the  difference  between  any  two  succeeding  lines. 
The  test  consists  in  the  correct  listing  of  each  of  the  words 
in  the  9  lines  according  to  a  definite  classification  laid  down 
in  the  direction  for  taking  the  test.  The  time  element  is  not 
considered  at  all. 
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The  scale  really  measures  the  ability  to  understand  printed 
words  only  well  enough  to  classify  them  and  it  does  not  test 
a  pupil's  knowledge  of  a  word  in  its  natural  setting.  Other 
limitations  of  the  scale  are:  (i)undue  predominance  is  given 
to  names  of  animals  and  flowers,  (2)  the  omission  of  pro- 
nouns, conjunctions,  prepositions,  auxiliary  verbs  and  other 
words  expressing  relation. 
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CHART  XXII.— Achievement  as  shown  by  Thorndike  Test 

There  is  a  close  relation  in  the  findings  of  all  three  tests. 
The  fourth  and  fifth  grades  are  making  fair  progress  while 
the  sixth  and  seventh  grades  are  much  below  standard. 

Starch's  test  for  speed  shows  that  the  entire  school  reads 
slowly,  which  may  account  for  poor  comprehension. 

That  the  entire  school  is  made  up  largely  of  retarded 
pupils  is  a  possible  explanation  of  the  low  scores  obtained. 
This  can  be  fairly  well  determined  by  a  study  of  the  tests  in 
all  school  subjects. 

Attention  is  called  to  the  decided  overlapping  in  grades. 
Of  the  31  pupils  in  the  sixth  grade  14  could  just  as  well  be 
placed  in  the  seventh ;  six  of  the  fourth  grade  could  do  the 
work  in  reading  of  the  fifth  grade. 


IX 

APPLICATION  OF  AYRES,  BUCKINGHAM,  AND 
STARCH  SCALES  IN  SPELLING 

Dorothy  B,  Berry 

The  Ayres  Scale  is  based  upon  the  one  thousand  most 
common  words  in  the  English  language.  These  words  were 
selected  by  combining  the  results  of  four  previous  investi- 
gations, which  had  as  their  object  the  selection  of  the  words 
most  commonly  used  in  different  sorts  of  writings.  The 
first  study  was  made  by  the  Rev.  J.  Knowles  in  1904  in  a 
pamphlet  entitled,  "The  London  Point  System  of  Reading 
for  the  Blind."  From  passages  in  the  English  Bible  and 
from  various  authors,  containing  100,000  words,  a  list  was 
made  of  the  353  words  which  occurred  most  frequently. 
The  second  study  was  made  by  R.  C.  Eldridge  and  the  re- 
sults were  published  in  191 1  in  "Six  Thousand  Common 
English  Words."  The  frequency  of  different  words  was 
made  on  a  basis  of  an  analysis  of  250  articles  taken  from 
issues  of  four  Sunday  newspapers  in  Buffalo.  These  ar- 
ticles counting  repetitions  contained  43,989  words. 

The  third  study  was  made  by  L.  P.  Ayres  in  191 3  and  re- 
sults were  published  in  "  The  Spelling  Vocabularies  of  Per- 
sonal and  Business  Letters."  The  study  consisted  of  the 
tabulation  of  200,000  words  taken  from  family  correspond- 
ence of  13  adults.  The  total  vocabulary  consisted  of  5»200 
different  words.  The  list  of  one  thousand  words  finally 
selected  was  determined  by  finding  the  frequency  with 
which  each  word  appeared  in  the  tabulation  of  each  study, 
weighting  that  frequency  to  the  size  of  the  base,  adding  the 
four  frequencies  and  finding  their  average. 

The  1,000  words  were  first  made  up  into  50  lists  of  20 
words  each  and  these  lists  were  then  given  to  various  grades 
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in  the  schools  of  84  cities.  The  data  secured  from  these 
tests  made  an  aggregate  of  1,400,000  speUings  by  70,000 
children.    The  results  constitute  the  basis  of  the  scale. 

The  scale  explains  itself.  It  is  divided  into  twenty-six 
columns  lettered  from  A  to  Z.  All  the  words  in  each  are  of 
approximately  equal  spelling  difficulty.  The  steps  in  spell- 
ing difficulty  from  each  column  to  the  next  are  approximately 
equal  steps.  The  numbers  at  the  top  of  the  scale  indicate 
about  what  per  cent  of  correct  spellings  may  be  expected 
among  the  children  of  the  different  grades.  By  means  of 
these  groupings  a  child's  spelling  may  be  located  in  terms  of 
grades. 

The  Starch  Tests  were  selected  in  the  following  manner: 
The  first  defined  word  on  every  even-numbered  page  in 
Webster's  New  International  Dictionary  was  chosen,  mak- 
ing a  total  of  1186  words.  From  these  all  technical,  scien- 
tific, and  obsolete  words  were  discarded,  leaving  600  words. 
These  were  then  arranged  alphabetically  in  the  order  of  size, 
beginning  with  three  letter  words  down  to  the  longest.  This 
list  was  then  divided  into  six  lists  of  100  words  each,  by 
choosing  for  the  first  list  the  ist,  7th,  13th,  etc. ;  for  the  sec- 
ond list  the  2d,  8th,  14th,  etc. ;  and  so  on  until  the  sixth 
list  was  completed.  These  tests  have  been  standardized  by 
administering  them  to  2,500  pupils  in  12  schools  of  5  cities. 
The  average  results  have  been  tabulated. 

The  Buckingham  list  was  selected  in  the  following  man- 
ner: From  a  list  of  5,000  words,  taken  from  five  SpelHng 
Books  a  list  of  270  words  was  used  for  a  test.  This  was 
called  the  "  Original  List."  These  words  had  to  satisfy  the 
following  requirements :  ( i )  All  of  them  had  to  be  words 
in  the  speaking  vocabulary  of  a  third  grade  child,  and  (2) 
spelling  difficulty  of  many  of  them  had  to  be  great  enough  to 
test  the  ability  of  eighth  grade  children.  These  were  then 
placed  in  a  continuous  passage  and  the  whole  dictated  to 
different  grades.  Two  measurements  were  recorded:  (i) 
the  number  of  times  each  word  was  correctly  spelled  in  each 
grade,  and  (2)  the  percentage  of  the  entire  number  of  words 
each  pupil  spelled  correctly  in  each  grade. 


APPLICATION  OF  SCALES  IN  SPELLING 


59 


The  basis  upon  which  the  "  Selected  List "  was  chosen  is 
as  follows :  Referring  to  the  previous  study  it  was  seen  that 
the  word  across  was  spelled  by  17  per  cent  of  the  third  grade 
children,  which  means  that  it  was  not  too  hard  to  serve  as  a 
test  of  their  ability.  By  the  time  the  eighth  grade  was 
reached  it  still  served  as  a  test  of  ability.  Thus  100  words 
were  selected. 

These  words  were  again  put  into  sentences  and  from  the 
data  collected  two  lists  were  then  selected,  each  containing 
25  words,  which  show  a  regular  increase  in  difficulty  as  we 
pass  from  grade  to  grade.  These  are  known  as  the  "  First 
Preferred  List "  and  "  Second  Preferred  List."  In  this  way 
Buckingham  has  provided  a  basis  of  comparison,  as  a  method 
of  testing  the  relative  ability  of  different  classes. 
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In  the  first  Ayres'  test  the  grades  attained  the  average 
score  with  only  one  deviation  of  — 9  in  the  fourth  grade.  In 
the  second  Ayres'  test  there  was  only  a  slight  negative  de- 
viation in  the  three  lower  grades,  but  in  the  seventh  grade, 
the  deviation  was  +  2. 


60         APPLICATION  OF  SCALES  IN  SPELLING 

No  word  in  the  Ayres'  tests  was  missed  more  than  forty 
times  nor  less  than  five  times.  In  the  Starch  test  some 
words  were  missed  120  times  or  more. 

The  Starch  average  scores  are  rather  low  and  none  of 
the  grades  attained  the  average.  The  deviation  ranged  from 
—  4  to— 7. 

In  constructing  a  test  for  any  grade  only  the  crucial  words 
should  be  used.  Crucial  words  are  believed  to  be  those 
which  may  be  spelled  by  50  per  cent  of  the  pupils  of  that 
grade,  or  those  words  of  approximately  equal  difficulty  of 
which  the  average  score  of  the  pupils  of  that  grade  will  be 
50  per  cent.  Buckingham  used  the  percentage.  In  this 
respect  the  Ayres  and  Buckingham  methods  are  superior  to 
the  Starch  method. 

The  Starch  scale  is  merely  a  random  selection  of  words 
with  no  regard  for  the  child's  writing  vocabulary.  It  seems 
that  any  test  would  be  valueless  in  testing  words  the  child 
has  never  studied.  Such  words  are  nunciature,  bizarre,  and 
ineffectuality  serve  as  good  examples.  In  scoring,  all  words 
are  of  the  same  value.    But  has  the  same  value  as  nunciature. 

It  seems  that  spelling  ability  can  hardly  be  measured  by 
an  arbitrary  list  of  words.  No  list  of  50  words  is  sufficient 
to  test  spelling  ability.  The  Buckingham  scale  might  serve 
to  test  large  groups  of  children,  but  hardly  the  individual. 

Ayres  has  scaled  a  foundation  spelling  vocabulary  and  has 
presented  groups  of  words  of  equal  spelling  difficulty.  In 
this  respect  Ayres  is  superior  to  the  others  in  that  he  has 
presented  a  representative  basic  list,  consisting  of  1,000 
words. 

In  each  test,  the  boys  scored  as  high  as  the  girls  and  in 
several  instances  surpassed  them. 


THE  TRABUE  COMPLETION  TEST 

Maynard  a.  Clemens  and  Franklin  E.  Rathbun 

As  defined  by  the  originator,  the  Trabue  sentence  com- 
pletion test  is  an  index  of  language  ability.  It  may  also  be 
considered  as  a  test  of  ability  in  logical  thinking.  Professor 
H.  Ebbinghaus  who  devised  the  paragraph  completion  test, 
of  which  the  present  test  is  a  direct  lineal  descendant,  stated 
that  it  constituted  a  real  test  of  intelligence.  Other  psy- 
chologists have  identified  it  as  a  test  of  "association,"  "  mem- 
ory," and  "  imagination." 

This  test  consists  of  a  number  of  sentences  having  one 
or  more  blank  spaces  where  words  have  been  omitted.  The 
students  are  called  upon  to  write  the  most  appropriate  words 
they  can  think  of  in  these  blanks.  No  list  of  words  has 
been  arbitrarily  determined  upon  in  advance  to  be  supplied ; 
hence,  in  most  cases,  there  is  an  option  of  several  words. 
The  sentences  are  of  progressive  difficulty,  permitting  only 
the  survival  of  the  fittest.  The  first  few  are  of  such  a  char- 
acter that  little  difficulty  will  be  experienced  in  supplying 
the  missing  word  even  by  the  lower  grade  students,  whereas 
the  last  often  baffle  the  wits  of  mature  men  and  women. 

An  arbitrary  system  of  scoring  is  employed.  Mistakes  of 
orthography  are  not  considered;  simply  the  aptness  of  the 
words  filled  in  is  judged.  Considering  carefully  the  con- 
text if  the  sentence  has  been  completed  satisfactorily,  a  score 
of  2  is  given;  if  slight  grammatical  mistakes  occur  or  in- 
felicitous words  have  been  used,  a  score  of  i  is  given ;  but 
if  a  wrong  word  has  been  employed,  making  an  utterly  hope- 
less expression,  zero  is  assigned. 

In  devising  this  test.  Dr.  Trabue  took  fifty-six  incomplete 
sentences  of  graduated  difficulty  and  in  1 914-15  secured  re- 
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suits  from  several  thousand  students  of  New  York  and  New 
Jersey.  A  careful  evaluation  of  these  sentences  was  made; 
and,  as  a  result,  many  were  discarded.  Twenty-four  were 
retained  and  graded,  constituting  a  new  test.  This  was 
called  scale  A,  and  during  191 5  it  was  given  to  6,000  stu- 
dents of  New  York,  New  Jersey  and  many  middle  Western 
states.  ' 

Since  then,  scale  A,  which  was  too  cumbersome  and  re- 
quired too  much  time  for  presentation,  has  been  formulated 
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into  a  series  of  smaller  scales  of  ten  sentences  each  requir- 
ing only  5  to  7  minutes  for  testing. 

We  have  tested  with  scale  B  one  hundred  twenty-six  stu- 
dents distributed  in  the  fourth,  fifth,  sixth  and  seventh 
grades  of  the  Demonstration  School. 

In  grading  the  papers  the  grading  given  by  the  Trabue 
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monograph  was  considered  a  final  court  of  appeal.  It  is  in- 
teresting to  note  that  the  list  of  words  given  there  as  answers 
of  school  children  for  this  test  conformed  quite  generally  to 
the  list  secured  by  us. 

Following  are  the  general  results: 


Distribution 

OF  Total  Scores  for  Sentences 

Grade 

Students 
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10 
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jrade 
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32 
26 
32 
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58 
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50 
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242 
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50 
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f  Students 

lining  score.. o  000112057171211182116    9     5     o     i     o        126 

Median  13.38 

The  originator  believes  that  this  scale  will  mark  quite 
definitely  the  intelligence  for  each  grade.  This  being  true 
we  should  expect  a  progressive  increase  in  the  average  and 
an  advancement  in  the  position  of  the  median  from  group  to 
group.  This  rate  of  progress,  too,  should  be  fairly  well 
fixed. 

In  general  these  conclusions  are  substantiated  by  the  re- 
sults secured  by  us. 


Grade 

Estimated 
Median 

Actual  Median 

Difference 

Average  Score 

Fourth 

8.0 
9.6 

II.O 

12.3 

10.75 

II-5 

13.75 

14.63 

2.754- 

1.9  H- 

2.75+ 
2.3  + 

10. 

Fifth 

II. 

Sixth 

13. 

Seventh  

14.2 

The  better  showing  of  these  students  than  the  calculated 
estimates  for  similar  grades  is  probably  due  to  the  com- 
paratively smaller  number  which  is  less  affected  by  ex- 
tremes and  to  the  fact  that  these  pupils  are  about  ready  for 
a  grade  higher  than  that  in  which  they  are  now  classified. 
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The  value  of  the  scale  should  be  considered.  Is  it  worth 
while?  Does  it  lay  bare  definite  language  faults  to  enable 
a  cure  to  be  administered  ?  It  is  doubtful  if  it  brings  results 
which  could  be  obtained  by  other  tests.  Certainly  as  a  lan- 
guage test,  it  leaves  much  to  be  desired.  No  provision  has 
been  worked  out  for  determining  the  results  qualitatively. 
With  it,  no  one  can  exactly  diagnose  the  student's  troubles ; 
nor  is  the  cure  very  plain.  To  obtain  a  higher  score  does  a 
student  need  more  grammar,  more  reading,  more  spelling, 
literature,  a  larger  vocabulary?  Probably  all  of  these,  as 
higher  scores  seem  to  mark  advanced  education.  It  may 
be  better  suited  for  testing  general  intelligence.  Hence,  we 
must  conclude  that  it  is  probably  better  correlated  with  other 
tests.    Alone,  it  is  simply  an  index. 


XI 

HILLEGAS  SCALE  FOR  THE  MEASUREMENT  OF 
QUALITY  IN  ENGLISH  COMPOSITION 

J.  B.  H.  Bowser  and  H.  L.  Rinehart 

This  test  in  composition  was  given  to  grades  four,  five, 
six,  and  seven  of  the  Johns  Hopkins  University  Summer 
Demonstration  School. 

A  few  minutes  were  allowed  the  pupils  to  place  on  the 
paper  the  name,  date,  age,  grade,  and  school.  The  subject 
was  The  Season  that  I  Like  Best  and  Why.  The  time  given 
for  writing  the  composition  was  fifteen  minutes. 

The  reading  and  evaluation  of  the  compositions  were  made 
independently  by  the  writers. 

Table  6  shows  .the  qualities  and  steps  into  which  each 
judge  placed  each  composition  of  the  four  grades.  It  will 
be  seen  that,  in  many  cases,  the  scores  given  by  the  indi- 
vidual judges  place  the  compositions  in  the  same  step. 
When  this  did  not  occur,  the  average  of  the  score  given  a 
composition  by  the  two  judges  was  taken  as  the  final  score 
for  that  paper  and  the  composition  was  placed  in  the  step  to 
which  that  final  score  belonged. 

Table  7  gives  the  grade  distribution  in  which  overlapping 
of  the  scores  in  the  several  grades  is  apparent.  The  Stand- 
ard Medians  as  given  by  Starch  are:  for  the  fourth  grade, 
26;  for  the  fifth  grade,  31 ;  for  the  sixth  grade,  36;  and  for 
the  seventh  grade,  41. 

Table  8  shows  that  the  median  of  grade  four  is  23,  a 
deviation  of  — 3  from  the  standard,  of  grade  five,  30;  a 
deviation  of  — i  ;  of  grade  six,  36,  no  deviation,  and  of 
grade  seven,  44,  a  deviation  of  +  3.  This  table  also  shows 
the  medians  as  given  by  the  individual  judges.    The  medians 
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TABLE  6 

Comparison  of  the  Judges*  Marks  and  the  Final  Step  into  which  In- 
dividual Compositions  are  Placed 
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of  the  first  judge  show  a  deviation  from  the  standards  of 
o  in  the  fourth  grade,  of  — 2  in  the  fifth  grade,  of  —  i  in 
the  sixth  grade,  and  of  zero  in  the  seventh  grade. 

Those  of  the  second  judge  show  deviations  of  — 5,  — 2, 
o,  and  +  2,  for  the  fourth,  fifth,  sixth,  and  seventh  grades, 
respectively. 

TABLE  7 
Distribution  by  Grades 


Steps 

0 

I 

a 

3 

4 

5 

6 

7 

« 

9 

Grade  IV 

I 

13 

5 

17 
10 
10 

0 

9 
14 
17 

6 
12 

0 
0 
2 

8 

0 
0 
0 
0 

0 
0 
0 
0 

0 
0 
0 

0 

0 

Grade  V 

0 

Grade  VI 

0 

Grade  VII 

0 

Table  9  shows  a  distribution  of  the  papers  by  the  indi- 
vidual judges. 

Table  6  shows  that  the  widest  deviation  in  qualities  as- 
signed to  any  one  paper  by  the  individual  judges  is  from 


TABLE  8 
Medians  of  Grades 


Grade 

IV 

V 

VI 

VII 

Standard  median 

26 

23 
26 
21 

31 
30 
29 

28 

36 
36 

41 

Grade  median 

44 

First  judge's  median 

41 

Second  judge's  median 

43 

zero  to  26 ;  but  it  also  shows  that  in  many  cases,  there  is  no 
deviation  whatever,  or  very  slight  deviation.  This  indicates 
that  although  a  wide  range  of  individual  judgments  is  pos- 
sible, the  scale  is  an  aid  to  the  judgment  in  rating  compo- 
sitions. 

It  is  true  that,  because  of  the  impossibility  of  eliminating 
subjective  reactions,  one  constantly  feels  a  tendency  to  throw 
aside  the  scale,  and  to  use  personal  judgment  instead;  but 
comparison  of  the  medians  in  this  test  with  the  standard 
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medians  will  show  that,  although  the  scale  is  not  a  substitute 
for  judgment,  it  can  be  used  by  them  as  a  guide  in  rating  the 
scales. 


I. 


TABLE  9 

Distribution  by  Individual  Judges 

Grade  IV 


Steps 

o 

z 

2 

3 

4       5 

6 

7 

8 

9 

Bv  first  ludsre 

I 

2 

15 

25 

13 

I 

I 

0  0 

1  0 

o 

0 

O 
0 

0 
0 

0 

By  second  judge 

0 

2. 

Grade  V 

By  first  judge 

O 
0 

6 

5 

10 

13 

9 

5 

I 

3 

o 

0 

0 
0 

O 
0 

0 
0 

o 

By  second  judge 

0 

3. 

Grade  VI 

By  first  judge 

0 
0 

3 

0 

9 
II 

13 
II 

I 

lO 

6 

0 

o 

0 

O 
0 

O 
0 

0 

By  second  judge 

0 

4. 

Grade  VII 

Bv  first  iudee 

0 
0 

0       21 

20 

i8 

5 

12 

8 
7 

I 

0 

I 

O 

0 
0 

0 

By  second  judge 

0 

o| 

0 

XII 

THE  USE  OF  THE  BALLOU   SCALE  ON  A   SET 

OF  COMPOSITIONS  WRITTEN  BY 

SEVENTH  GRADE  PUPILS 

Grace  E.  Manson  and  Louise  W.  Linthicum 

Educational  scales  have  developed  out  of  actual  school  ex- 
perience and  in  response  to  school  needs.  A  study  of  the 
practice  of  teachers  in  marking  discloses:  (i)  wide  vari- 
ability of  standards  from  subject  to  subject  and  from  school 
to  school;  (2)  a  need  of  more  definite  and  concrete  stand- 
ards by  which  to  measure  school  work.  The  purpose  of  the 
Ballou  scale  is  to  create  an  objective  standard  for  measuring 
English  compositions  in  order  to  make  the  judgments  of 
English  teachers  more  uniform.  This  objective  standard 
shall  serve  as  a  basis  for  the  exercise  of  subjective  judgment. 

The  complete  scale  is  composed  of  four  separate  scales: 
one  for  narration,  a  second  for  description,  a  third  for  ex- 
position, and  a  fourth  for  argumentation.  Each  of  these 
scales  is  composed  of  the  type  compositions.  The  subject 
of  each  is  different.  They  are  ranked  approximately  95 
per  cent,  85  per  cent,  75  per  cent,  65  per  cent,  55  per  cent, 
45  per  cent.  Under  each  composition  is  a  series  of  re- 
marks made  by  the  compilers  under  these  headings: 
"  Merits,"  which  tell  the  weak  points  :  "  Comparison,"  which 
justifies  the  position  of  the  given  composition  in  the  scale. 
Each  composition  with  rating  is  intended  as  an  objective 
measure  for  any  composition  work  of  eighth  grade  pupils. 
The  compilers  believe  it  can  be  used  to  measure  seventh  and 
ninth  grade  work  as  well. 

In  order  to  use  the  scale : 

I.  Find  to  which  style  of  discourse  the  composition  be- 
longs. 
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2.  By  comparison  with  the  scale,  roughly  divide  the  com- 
positions into  six  groups  accrediting  to  them  relative  merits 
as  measured  by  the  six  types  of  the  scale. 

3.  Then  grade  them  in  the  class  to  which  they  belong. 
For  example,  if  there  were  five  composition  in  A  class,  they 
might  be  graded  93,  91,  90,  89,  87,  according  to  their  indi- 
vidual merits  and  defects  as  measured  by  the  scale,  and  as 
compared  with  each  other. 

The  report  which  follows  is  an  account  of  the  use  of  this 
scale  on  a  set  of  compositions  written  by  seventh  grade 
pupils  in  the  Johns  Hopkins  Demonstration  School. 

The  two  students  assigned  to  give  the  test  selected  six  suit- 
able descriptive  topics,  after  which  one  was  to  be  selected 
by  the  writers.  In  order  that  the  class  might  be  as  little  dis- 
turbed as  possible,  the  grade  teacher  was  asked  to  have  the 
composition  written.  The  class  chose  "A  Fire  Engine 
House."  The  time  used  was  twenty-two  minutes.  Thirty- 
four  compositions  were  written. 

Each  of  the  teachers  making  the  test  graded  the  papers 
independently  by  the  scale.  As  a  comparative  study,  a  class 
of  twenty-four  English  teachers  in  the  Hopkins  Summer 
School  were  asked  to  mark  the  papers  by  the  ordinary  per- 
centage method. 

1.  To  find  the  range  of  variations  made  by  the  class  using 
the  percentile  method.     (See  Table  10.) 

2.  To  find  the  average  grade  given  the  papers  by  each  of 
the  twenty-four  readers.     (See  Table  11.) 

3.  To  find  the  coefficients  of  correlation  between  the  av- 
erage ranking  of  the  class  and  each  investigator;  also  the 
coefficients  of  the  two  investigators.  (See  following  para- 
graph 6.) 

4.  To  find  and  compare  the  median  grade  of  the  class  with 
the  medians  of  each  of  the  users  of  the  scale.  (See  follow- 
ing paragraph  5.) 

5.  To  check  the  scores  made  in  this  test  with  the  scores 
made  in  the  Port  Townsend,  Washington,  test.  (See  fol- 
lowing paragraph  4.) 
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TABLE   II 
Range  of  Scores 


No.  of  Composition 

I   1   2 

3 

4 

5 

6 

7 

8 

9 

10 

II 

I. 

13 

14 

'5 

16 

85  92 

95 

85 

75 

95 

85 

90 

90 

88 

83 

95 

88  90 

88 

90 

85 

90 

94 

85 

75 

95 

85 

90 

85 

90 

80 

95 

85  !  90 

88 

85 

8o 

85 

90 

85 

75 

95 

80 

88 

80 

78 

80 

95 

85  85 

85 

85 

80 

85 

90 

80 

70 

92 

80 

85 

80 

78 

80 

95 

83  183 

85 

83 

75 

85 

90 

80 

70 

90 

80 

85 

78 

75 

78 

95 

80 

82 

85 

82 

75 

85 

90 

80 

70 

90 

70 

85 

78 

75 

76 

94 

80 

80 

80 

80 

75 

80 

90 

79 

70 

90 

68 

84 

76 

75 

72 

92 

80 

80 

80 

80 

75 

80 

90 

79 

70 

90 

65 

80 

75 

75 

72 

90 

80 

75 

79 

80 

72 

80 

90 

75 

70 

88 

65 

80 

75 

75 

70 

90 

80 

75 

75 

80 

70 

80 

85 

75 

65 

85 

65 

80 

75 

75 

70 

90 

75 

75 

75 

79 

65 

80 

85 

75 

65 

85 

65 

80 

70 

74 

70 

88 

75 

75 

75 

79 

65 

80 

85 

75 

65 

85 

65 

78 

70 

30 

70 

85 

75 

75 

75 

75 

6o 

80 

85 

70 

65 

81 

60 

76 

70 

70 

65 

85 

75 

75 

75 

75 

6o 

80 

84 

70 

60 

80 

60 

75 

70 

70 

65 

85 

74 

75 

72 

70 

6o 

80 

83 

70 

60 

80 

60 

70 

70 

65 

65 

85 

70 

68 

70 

70 

6o 

78 

80 

70 

60 

80 

60 

70 

68 

65 

65 

85 

70 

65 

70 

70 

6o 

75 

80 

65 

56 

80 

60 

70 

62 

60 

60 

84 

65 

65 

70 

70 

6o 

75 

80 

65 

50 

75 

60 

65 

60 

60 

60 

80 

60 

65 

70 

65 

6o 

70 

75 

60 

48 

75 

55 

65 

60 

60 

60 

80 

60 

65 

70 

65 

6o 

70 

70 

60 

40 

73 

55 

65 

55 

50 

55 

78 

60 

60 

70 

65 

60 

65 

70 

60 

40 

72 

50 

60 

55 

40 

50 

75 

60 

60 

68 

60 

50 

65 

70 

60 

40  70 

50 

60 

50 

40 

50 

75 

60 

60 

68 

55 

50 

60 

65 

56 

40  60 

50 

50 

40 

35 

50 

75 

50 

60 

65 

55 

40 

60 

60 

55 

25 

60 

48 

40 

35 

20 

45 

75 

50 

50 

60 

50 

Class  ave 

65-9 

79-5 

82.6 

71.3 

59-3 

81.9 

64.2 

73.7 

67.7 

64.7 

63.2 

86.2 

71.6 

72.2 

74-9 

72.8|e 

Range  of  var. .  . . 

45 

32 

30 

30 

50 

35 

37 

50 

55 

68 

38 

20 

38 

40  28 

40  1 

No.  of  Composition 


18   19 


21   22   23   24   25   26   27   28   29 


30 


31       32       33       34 


75  I  30 
75  25 
70  25 
70     20 


70 
67 
65 
65 
60 
60 
60 
60 
60 
60 
53 
50 
SO 
50 
50 
50 
50 
50 
40 
40 


80 
75 
I  74 
70 
70 
70 
70 
70 
65 
65 
65 
60 
60 
60 
60 
60 
50 
50 
50 
48 
40 
40 
40 
25 


80  i  80 


58.4  11.6 
35  I  30 


590 

55 


77-3 
65 


65.8, 

55  I 


70.2 
40 


66.o;73.5 
43  I  38 


75.8 
34 


61.4 
55 


76.3  72.2 

35  I  55 


74-5|47-o 
70  I  73 


57.7 
45 


68.4 


70.2 


45  I  48 


Class  ave. 
Range  of  var 
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1.  The  range  of  variation  for  the  twenty-four  readers  was 
found  to  be  very  wide.  The  greatest  was  73  points  in  No. 
31 ;  the  highest  mark  given  it  being  88  per  cent ;  the  lowest, 
15  per  cent.  The  least  variation  was  twenty  points  in  No. 
12.  The  problem  here  seems  to  be:  What  standards,  if  any, 
had  these  twenty-four  teachers  in  mind?  Does  this  show 
need  of  better  standardization  of  the  judgments  of  teachers? 

2.  There  was  no  agreement  as  to  the  best  composition. 
The  class,  as  an  average,  considered  No.  12  best,  with  a  grade 
of  86 ;  one  writer  took  No.  26  and  valued  it  at  86 ;  the  other 
chose  No.  3,  and  graded  it  at  80  per  cent.  The  two  testers 
and  the  twenty-four  readers  chose  No.  19  as  the  poorest. 

3.  The  averages  given  by  the  twenty- four  readers,  not 
marking  by  the  scale,  were  very  generally  higher.  These 
marks  made  an  average  score  of  76.2 ;  the  average  of 
one  tester  scored  61.23;  the  average  of  the  other  tester 
scored  49.2. 

4.  Starch,  in  his  Educational  Measurements,  says  that  in 
the  Port  Townsend,  Washington,  test,  the  following  scores 
were  made: 

Grade  5         6         7         8        10        11        12 

Score    46        46        53        58        63        70        IZ 

Comparing  these  scores  with  those  obtained  in  the  present 
test,  it  would  seem  that  the  twenty-four  readers  rated  them 
as  eleventh  grade  work;  one  teacher  rated  them  as  ninth 
grade  work,  and  the  other  rated  them  as  sixth  grade  work. 

5.  The  median  of  the  class  was  found  to  be  70.5 ;  of  one 
investigator  59 ;  of  the  other  60. 

6.  The  coefficient  of  correlation  between  the  class  and 
each  investigator  was  found  to  be  higher  than  the  correlation 
between  the  investigators.  The  correlation  between  Inves- 
tigator I  and  class  was  .41.  The  correlation  between  Inves- 
tigator 2  and  class  was  .56.  The  correlation  between  the 
two  investigators  was  .26.  The  correlation  was  found  by 
Spearman  formula. 

The  poor  correlation  between  the  two  investigators  is 
probably  due  to  three  main  causes : 


74  USE  OF  THE  BALLOU  SCALE 

1.  The  fact  that  this  was  the  first  time  either  tester  had 
used  the  scale. 

2.  The  fact  that  one  was  more  experienced,  and  had  a 
correspondingly  keener  judgment  in  evaluating  compo- 
sitions. 

3.  Certain  defects  in  the  scale  itself.  Among  the  obvious 
defects  are: 

(a)  Lack  of  directions  for  giving  the  test.  In  this  case  a 
serious  complication  arose.  In  the  conference  over  the  re- 
spective marks,  it  was  found  that  No.  31  was  not  on  the 
subject  given.  One  graded  it  by  the  scale  according  to  its 
value  as  a  composition ;  the  other  gave  it  zero  because  it  was 
off  subject.  It  was  decided  that  the  composition  would  have 
to  be  graded  by  the  scale  irrespective  of  whether  or  not  it 
was  on  the  subject. 

(b)  The  scale  does  not  tell  what  merits  were  considered, 
or  whether  or  not  all  defects  were  considered  of  equal  value. 

The  two  students  making  this  test  agree  ( i )  that  a  scale 
is  of  the  highest  value;  (2)  they  think  that  the  Ballou  scale 
has  obvious  merits,  and  just  as  obvious  limitations.  It  is 
good  in  that  it  limits  its  range  to  measuring  work  of  one 
grade,  and  in  that  it  has  a  scale  for  all  four  forms  of  this 
course.  (3)  They  think  that  it  is  perfectly  clear  that  in  this 
case  almost  as  much  subjective  element  must  have  entered 
into  their  markings  by  the  scale  as  in  the  class  markings  by 
percentile  methods.  They  believe  that  the  continued  use  of 
this  scale  would  fix  a  more  definite  standard  in  their  own 
minds,  and  with  repeated  use,  their  variation  would  be  re- 
moved. (4)  They  feel,  too,  that  this  removal  of  the  per- 
sonal element  altogether  would  not  be  a  good  thing.  In  the 
case  quoted  above,  the  child  gained  a  grading  even  when  in- 
tentionally or  unintentionally  passing  oE  a  substitute  for  the 
real  thing.  We  think  that  the  child  was  the  loser  morally 
and  educationally. 

4.  It  is  harder  to  mark  by  the  scale.  It  takes  more  time 
until  the  scale  becomes  absolutely  a  fixture  in  your  mind. 

5.  The  "  A  "  class  of  composition  seems  too  high. 
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6.  This  scale  would  not  be  a  fair  basis  of  comparison  for 
two  schools  or  two  systems  unless  the  test  were  given  under 
conditions  as  nearly  like  the  conditions  under  which  the  orig- 
inal compositions  were  written  as  possible. 

The  writers  make  the  following  suggestions : 

1.  Have  a  set  of  directions  for  giving  the  test. 

2.  Have  the  compilers  explain  what  they  are  looking  for ; 
then  grade  a  system  of  papers  as  a  guide. 

3.  Have  a  series  of  compositions  on  the  same  subject. 
This  would  illustrate  the  degree  of  marking  better  and  be 
less  confusing. 

4.  To  compile  a  scale  for  the  four  years  of  high  school 
from  material  gathered  from  schools  all  over  the  United 
States  rather  than  from  one  community. 
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